Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

braid

braid is a NixOS CLI tool for managing an encrypted btrfs RAID1 NAS. These docs cover end-user workflows, command reference, design decisions, internals, and development practices.

Common tasks

Guides

GuideDescription
Install NixOSInstall NixOS itself before setting up braid
Getting startedFirst-time setup: find disks, create pool, unlock
Day-to-day NAS usageSubvolumes, file permissions, Samba shares
Auto-unlockUSB keyfile setup for unattended reboots
Monitoring and alertsDisk health alerts, beeper, alert commands
Power managementAuto-suspend, Wake-on-LAN, RTC wakeups
Fan controlHDD-driven chassis fan control, SATA hotswap
UPSNUT-backed orderly poweroff, preflight safety, live status
NixOS configurationModule options, scrub scheduling, pinned toolchain
Sharing and permissionsStorage group, mount permissions, Samba
Mounting subvolumesExpose a btrfs subvolume at a custom path
TroubleshootingENOSPC balance, paused balance, missing devices
Recovery scenariosInterrupted operations, lost pool.json, degraded mount

Commands

Commands marked 🧪 are experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

CommandDescription
addAdd disks to the pool or create a new pool
removeRemove a live disk from the pool
remove-missingForget a dead or missing device entry
replaceReplace a live or dead disk
unlockOpen LUKS devices and mount the pool
lockUnmount the pool and close LUKS devices
seal-mountpoint 🧪Seal the offline mountpoint immutable (boot-managed)
idle 🧪Check if the pool is idle for auto-suspend
statusPool health, disk status, allocation, scrub info
doctorDiagnostic checks for config and pool health
monitor 🧪Health check for alerting used by systemd timer
ack 🧪Acknowledge and silence an active alert
enroll 🧪Enroll a USB keyfile for auto-unlock
discover 🧪Scan for braid LUKS devices and rebuild pool.json
recover 🧪Recover from an interrupted operation
tuiInteractive dashboard with raw-output Browse tab
ups status 🧪Live UPS state from NUT, with JSON for scripts

Design

DocPurpose
PrinciplesAuthoritative invariants for braid behavior
Decision recordsRationale, history, and rejected alternatives

Internals

DocPurpose
LUKS unlockUnlock, header backup, and recovery-message contract
Device disappearanceExternal-tool output for missing device states
SATA hot-unplugReal hardware observations for hot-unplug behavior
btrfs notesbtrfs RAID profile, balance, ENOSPC, and LUKS notes

Development

DocPurpose
OverviewDevelopment workflow and dependency updates
TestingVM test conventions and framework gotchas
TUI snapshotsRatatui and Insta snapshot review workflow

Install NixOS

You can follow NixOS’ own guide here:

I’ll document the process and the post-install setup, mostly for my own notes.

Download NixOS image

This guide uses the graphical installer’s wizard for partitioning, swap, and user creation. The Minimal ISO is out of scope here – if you prefer it, follow NixOS’ install guide instead.

Format USB stick with NixOS image

  • Download Etcher
  • Plug in USB stick
  • Use Etcher to write your downloaded ISO image to your USB stick

Install NixOS on NAS computer

  • Plug in USB stick and boot from it
  • Choose “Install NixOS (Linux LTS)” – this launches the graphical installer
  • Click through the wizard. It handles partitioning, swap, and user creation for you.
  • Reboot when done; unplug the USB stick so it doesn’t boot from it again

Post-install

Enable SSH

The graphical installer doesn’t enable SSH by default. Log in physically on the NAS console with your user, then add openssh:

sudo nano /etc/nixos/configuration.nix
# add: services.openssh.enable = true;
sudo nixos-rebuild switch

Find the NAS’s LAN IP and SSH in from your laptop:

# On NAS
ip a            # look for LAN ip address

# On your laptop
ssh [email protected]

# Once logged in on NAS, change your password
passwd

The rest of this guide takes place over SSH from your laptop.

Install vim

We’ll add more packages later. For now I just want vim on the system to make the rest of the setup easier.

sudo nano /etc/nixos/configuration.nix

environment.systemPackages = with pkgs; [ vim ];

sudo nixos-rebuild switch

Make git repo for NixOS config

The beauty of nix is that your OS is configured by git-diffable config files.

Instead of editing /etc/nixos/*.nix files, I like to have a ~/world git repo that tracks the NAS’s nix config and push it to danneu/world.

I’ll name my NAS “nasbox” here.

~/world/
├── flake.nix
└── hosts/
    └── nasbox/                        # NAS (NixOS)
        ├── configuration.nix        # System config (boot, networking, services)
        ├── hardware-configuration.nix
        └── home.nix                 # User config (packages, shell, git, etc.)

Let’s stub out that folder tree:

mkdir -p ~/world/hosts/nasbox

We use home-manager to manage user-level config (packages, git, shell, etc.) separately from the system config. This keeps configuration.nix lean — just boot, networking, and services — while home.nix handles everything specific to your user.

In ~/world/flake.nix:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
    home-manager.url = "github:nix-community/home-manager/release-26.05";
    home-manager.inputs.nixpkgs.follows = "nixpkgs";
  };

  outputs = { nixpkgs, home-manager, ... }: {
    nixosConfigurations.nasbox = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        ./hosts/nasbox/configuration.nix
        home-manager.nixosModules.home-manager
        {
          home-manager.useGlobalPkgs = true;
          home-manager.useUserPackages = true;
          home-manager.users.dan = import ./hosts/nasbox/home.nix;
        }
      ];
    };
  };
}

Copy the generated NixOS config into your world repo:

cp /etc/nixos/configuration.nix ~/world/hosts/nasbox/
cp /etc/nixos/hardware-configuration.nix ~/world/hosts/nasbox/

Make sure hosts/nasbox/configuration.nix imports the hardware config with a relative path:

imports = [ ./hardware-configuration.nix ];

Create hosts/nasbox/home.nix for your user-level config:

{ pkgs, ... }:

{
  home.username = "dan";
  home.homeDirectory = "/home/dan";
  home.stateVersion = "26.05";
  programs.home-manager.enable = true;

  home.sessionVariables = {
    EDITOR = "vim";
    VISUAL = "vim";
  };

  programs.git = {
    enable = true;
    userName = "Your Name";
    userEmail = "[email protected]";
    extraConfig = {
      init.defaultBranch = "master";
      pull.rebase = true;
      push.autoSetupRemote = true;
    };
  };

  home.packages = with pkgs; [
    lazygit   # Terminal UI for git
    ripgrep   # Fast recursive grep (rg)
    fd        # Fast find alternative
    jq        # JSON processor
    htop      # Interactive process viewer
  ];
}

Now rebuild from the flake instead of /etc/nixos:

sudo nixos-rebuild switch --flake ~/world#nasbox

From now on, you edit ~/world/ as your normal user and only sudo for the rebuild. System-level config goes in configuration.nix, user-level config goes in home.nix.

Set up git and push to GitHub

Generate an SSH key on the NAS and add it to GitHub so you can push/pull:

ssh-keygen -t ed25519 -C "nasbox"
cat ~/.ssh/id_ed25519.pub

Copy the public key and add it at GitHub > Settings > SSH and GPG keys > New SSH key.

Then init and push:

cd ~/world
git init
git add -A
git commit -m "initial nixos config"
git remote add origin [email protected]:danneu/world.git
git push -u origin master

Set hostname and pin the IP

Edit ~/world/hosts/nasbox/configuration.nix:

networking.hostName = "nasbox";
sudo nixos-rebuild switch --flake ~/world#nasbox

For a stable IP, the simplest approach is a DHCP reservation on your router: look up the NAS’s MAC address (ip link show <iface>) and tell the router to always hand it the same address. The reservation lives on the router, not the host – no nix changes and no nixos-rebuild needed. Bonus: it survives interface renames.

If you’d rather pin it on the host, add to configuration.nix:

networking.interfaces.eno1.ipv4.addresses = [{
  address = "192.168.1.158";
  prefixLength = 24;
}];
networking.defaultGateway = "192.168.1.1";
networking.nameservers = [ "1.1.1.1" "8.8.8.8" ];

Then rebuild:

sudo nixos-rebuild switch --flake ~/world#nasbox

Set up SSH key auth

On your laptop, copy your public key to the NAS:

ssh-copy-id [email protected]

Now you can SSH in without a password. Optionally disable password auth in configuration.nix:

services.openssh = {
  enable = true;
  settings.PasswordAuthentication = false;
};

Set up Claude Code and Codex

numtide/llm-agents.nix is a daily-updated nix flake that packages 40+ AI coding agents, including Claude Code and OpenAI’s Codex CLI. It exposes them via an overlay under pkgs.llm-agents.*.

Add it as a flake input in ~/world/flake.nix and apply its overlay:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
    home-manager.url = "github:nix-community/home-manager/release-26.05";
    home-manager.inputs.nixpkgs.follows = "nixpkgs";

    # No `follows = "nixpkgs"` -- llm-agents is built against nixpkgs-unstable.
    llm-agents.url = "github:numtide/llm-agents.nix";
  };

  outputs = { nixpkgs, home-manager, llm-agents, ... }: {
    nixosConfigurations.nasbox = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        ./hosts/nasbox/configuration.nix
        { nixpkgs.overlays = [ llm-agents.overlays.default ]; }
        home-manager.nixosModules.home-manager
        {
          home-manager.useGlobalPkgs = true;
          home-manager.useUserPackages = true;
          home-manager.users.dan = import ./hosts/nasbox/home.nix;
        }
      ];
    };
  };
}

First, allow unfree packages in hosts/nasbox/configuration.nix:

nixpkgs.config.allowUnfree = true;

Then add both binaries to the existing home.packages list in hosts/nasbox/home.nix:

home.packages = with pkgs; [
  lazygit
  ripgrep
  fd
  jq
  htop
] ++ (with pkgs.llm-agents; [
    claude-code
    codex
  ]);

Rebuild:

sudo nixos-rebuild switch --flake ~/world#nasbox

Now you can run claude and codex from anywhere on the NAS.

Optional: use the numtide binary cache

By default, source-built agents like codex will compile locally on first install. To pull prebuilt binaries instead, add the numtide cache to your system config (in hosts/nasbox/configuration.nix):

nix.settings = {
  extra-substituters = [ "https://cache.numtide.com" ];
  extra-trusted-public-keys = [
    "niks3.numtide.com-1:DTx8wZduET09hRmMtKdQDxNNthLQETkc/yaX7M4qK0g="
  ];
};

Then rebuild. Subsequent installs will fetch from the cache.

Next steps

At this point you have a working NixOS machine with SSH access, a stable IP, Claude Code + Codex, and a git-tracked config. Next: add braid to your NixOS config.

← braid

Getting started

This guide walks you through first-time braid setup: installing the NixOS module, finding your disks, creating a pool, and unlocking it.

Read this if you have a fresh NixOS machine with empty drives and want to set up an encrypted RAID1 NAS.

What braid manages

braid owns two things:

  • LUKS encryption – each drive is individually encrypted with a shared passphrase. Keys are never stored on disk.
  • btrfs RAID1 – your encrypted drives form a single filesystem with automatic redundancy and self-healing checksums.

The NixOS module provides the systemd units, mount point, and toolchain. The CLI owns which disks are in the pool – adding or removing a drive is a braid command, not a nixos-rebuild.

Pool membership lives in /var/lib/braid/pool.json. This file is created by braid add and read by braid unlock. It is keyed by each member’s LUKS UUID; the disk name is stored inside each entry for commands and display.

Install the NixOS module

Add braid to your flake inputs and import the module:

# flake.nix
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
    braid.url = "github:danneu/braid?ref=release";
  };

  outputs = { nixpkgs, braid, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        braid.nixosModules.default
        ./configuration.nix
      ];
    };
  };
}

?ref=release follows braid’s release channel; nix flake update braid upgrades to the newest release. The snippet also keeps braid on its own pinned nixpkgs (no follows override) on purpose, so braid matches its binary cache (next section).

Minimal configuration:

# configuration.nix
braid = {
  enable = true;
  mountPoint = "/mnt/storage";  # default
};

Only braid.enable = true is required. nixosModules.default defaults braid.package to braid’s pinned braid-cli-unwrapped; mountPoint defaults to /mnt/storage.

Binary cache

braid publishes prebuilt binaries to a public Cachix cache. Add it before rebuilding so the NAS pulls the CLI instead of compiling Rust:

# configuration.nix
nix.settings = {
  extra-substituters = [ "https://braid.cachix.org" ];
  extra-trusted-public-keys = [ "braid.cachix.org-1:I/p7fx1z5n0+O80KzMuT7aXRdkVyHr/buZKaBu7HvJs=" ];
};

This relies on the no-follows input above – the cache only matches braid’s pinned nixpkgs. See NixOS configuration.

Rebuild and switch:

sudo nixos-rebuild switch

Find your disks

Use lsblk to identify the drives you want to add:

lsblk -d -o NAME,SIZE,MODEL,ID-LINK

Example output:

NAME  SIZE MODEL               ID-LINK
sda   12T  TOSHIBA MN07ACA12T  ata-TOSHIBA_MN07ACA12T_XXXX
sdb   12T  TOSHIBA MN07ACA12T  ata-TOSHIBA_MN07ACA12T_YYYY
sdc   12T  TOSHIBA MN07ACA12T  ata-TOSHIBA_MN07ACA12T_ZZZZ
sdd  500G  Samsung SSD 860     ata-Samsung_SSD_860_AAAA      # boot drive -- leave this alone

You need the ID-LINK values. braid always uses /dev/disk/by-id/ paths – never /dev/sdX, which can change between reboots.

Create the pool

Add your drives with braid add. Each drive gets a short name you choose and a by-id path:

sudo braid add \
  toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_XXXX \
  toshiba2=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_YYYY \
  toshiba3=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_ZZZZ

braid will:

  1. Ask you to set a LUKS passphrase (used for all drives).
  2. Format each drive with LUKS encryption.
  3. Create a btrfs RAID1 filesystem across all drives.
  4. Mount the pool at /mnt/storage.
  5. Write pool membership to /var/lib/braid/pool.json.

All drives join the same btrfs RAID1 filesystem. btrfs RAID1 keeps exactly 2 copies of every block regardless of how many drives you add, so the pool tolerates a single drive failure – a 3-drive pool tolerates the same single failure as a 2-drive pool, with more usable capacity. See Day-to-day usage for what additional drives buy you and how to add them later.

The disk names (toshiba1, toshiba2, etc.) are permanent presentation labels used in all future commands. Pick something short and meaningful. braid uses the LUKS UUID, not the name or LUKS label, as the persistent disk identity.

After braid add completes, the pool is online and mounted. You can start using it immediately:

ls /mnt/storage/
cp photos/* /mnt/storage/photos/

Unlock after reboot

When the NAS reboots, the pool is offline – LUKS drives are closed and nothing is mounted. This is by design: your data stays encrypted until you explicitly unlock it.

SSH into the NAS and unlock:

ssh user@nas
sudo braid unlock

braid prompts for your LUKS passphrase, opens all drives, assembles the btrfs pool, and mounts it.

Check pool health

sudo braid status

This shows:

  • Pool state (online/offline)
  • Each disk’s health and LUKS status
  • Disk allocation and free space
  • Scrub status
  • Any active alerts

Lock the pool

When you want to take the pool offline (unmount and close LUKS):

sudo braid lock

This is optional – the pool locks automatically on shutdown. Manual locking is useful before maintenance or if you want to ensure the drives are encrypted at rest while the machine stays on.

What’s next

← braid

Day-to-day NAS usage

This guide covers normal operation: the reboot cycle, checking on your pool, adding disks over time, organizing your data with subvolumes, and good operator habits.

Read this after you have completed Getting started and have a working pool.

The daily cycle

A typical NAS session looks like this:

NAS powers on
  -> boots to login (pool offline, drives encrypted)
  -> SSH in
  -> sudo braid unlock (enter passphrase)
  -> pool online at /mnt/storage
  -> use it (copy files, stream media, backups)
  -> pool locks automatically on shutdown

If you have auto-unlock configured with a USB keyfile, the unlock step happens automatically at boot.

Unlock

ssh user@nas
sudo braid unlock

braid prompts for your LUKS passphrase, opens all drives, and mounts the pool.

Use the pool

The pool is a normal directory at /mnt/storage (or whatever you set braid.mountPoint to). Copy files, create directories, share via Samba/NFS – it works like any other filesystem.

cp ~/photos/* /mnt/storage/photos/
rsync -av ~/documents/ /mnt/storage/documents/

Lock

sudo braid lock

This unmounts the pool and closes all LUKS devices. The pool locks automatically on shutdown, so manual locking is only needed if you want drives encrypted while the machine stays on.

Checking pool health

Run braid status periodically to check on your pool:

sudo braid status

This shows pool state, disk health, allocation, scrub history, and any active alerts. Make a habit of glancing at this after unlocking, especially if the NAS has been running unattended.

For an interactive view:

sudo braid tui

The TUI dashboard shows pool health, disk status, balance progress, and SMART data in a live-updating terminal interface.

Organizing data with subvolumes

btrfs subvolumes are the right way to organize different categories of data on your NAS. Think of them as lightweight partitions within the pool.

Subvolumes vs directories

A plain directory works fine for storing files, but subvolumes give you:

  • Independent snapshots – snapshot your documents without snapshotting your movie library.
  • Per-subvolume quotas – limit how much space a category can use (optional).
  • Selective backup – send/receive individual subvolumes to an external drive.
  • No cost upfront – subvolumes are free to create. They share the pool’s space with no pre-allocated size.

There is no downside to creating subvolumes early. If you later decide you do not need snapshots for a category, the subvolume still works exactly like a directory.

Creating subvolumes

Create subvolumes for your major data categories:

sudo btrfs subvolume create /mnt/storage/documents
sudo btrfs subvolume create /mnt/storage/photos
sudo btrfs subvolume create /mnt/storage/movies
sudo btrfs subvolume create /mnt/storage/music
sudo btrfs subvolume create /mnt/storage/backups

Then use them like normal directories:

cp ~/report.pdf /mnt/storage/documents/
rsync -av ~/Photos/ /mnt/storage/photos/

Snapshots

Snapshot a subvolume to create a point-in-time copy:

sudo btrfs subvolume snapshot -r /mnt/storage/documents /mnt/storage/.snapshots/documents-2026-04-09

The -r flag makes it read-only, which is best practice for backup snapshots. Snapshots are nearly instant and use no extra space until the original data changes. Deleting a file from the original subvolume does not reclaim its blocks while any snapshot still references them. To free that space, delete the snapshots holding the data with sudo btrfs subvolume delete /mnt/storage/.snapshots/<name>.

Listing subvolumes

sudo btrfs subvolume list /mnt/storage

To mount a specific subvolume at a custom path, for example for a service or a friendlier path under /home, see Mounting subvolumes.

Adding disks over time

You can add new drives to an existing pool without rebuilding or reformatting:

# Find the new drive
lsblk -d -o NAME,SIZE,MODEL,ID-LINK

# Add it
sudo braid add newdisk=/dev/disk/by-id/ata-NEWDISK_SERIAL

braid formats the new drive with LUKS (using your existing passphrase), adds it to the btrfs pool, and rebalances data across all drives. No nixos-rebuild required.

The balance runs in the foreground – braid add holds the terminal and does not return until it finishes, which can take hours on a large pool. braid shows live balance progress while it runs.

btrfs RAID1 keeps exactly 2 copies of every block no matter how many drives the pool has. A 3rd or 4th drive gives you more usable capacity, but it does not increase fault tolerance – the pool still tolerates a single drive failure, the same as a 2-drive pool. See Decision 001 for the rationale.

Responding to alerts

If the NAS beeps (or sends you an alert via a custom command), something needs attention:

  1. SSH in and check status:

    sudo braid status
    
  2. The status output shows what triggered the alert: btrfs device errors, a missing disk, or a SMART warning.

  3. Investigate and fix the issue (replace a failing disk, check cables, etc.).

  4. Once resolved, acknowledge the alert to silence it:

    sudo braid ack
    

See Monitoring and alerts for details on how alerts work.

Good operator habits

  • Check braid status after unlocking – a quick glance catches problems early.
  • Keep LUKS header backups – braid stores header backups in /var/lib/braid/luks-headers/ after operations that modify LUKS headers. Copy each .luksheader file off the NAS to a separate location, then delete the local file (braid status warns until they are removed). If a drive’s LUKS header is corrupted and you have no off-system backup, the data on that drive is unrecoverable.
  • Run braid doctor – periodically check for configuration problems:
    sudo braid doctor
    
  • Let scrubs complete – braid runs monthly scrubs by default. Scrubs verify every block’s checksum and repair corruption from redundant copies. braid starts them at low CPU priority (Nice=19) and idle I/O priority (IOSchedulingClass=idle). The CPU priority always applies; the I/O priority is best-effort – how strongly the kernel honors it depends on your block-layer I/O scheduler – so do not treat it as a guarantee that scrubs will never affect interactive workloads. The pool stays online throughout. If scrubs noticeably impact Samba, NFS, or local use on your hardware, retime them with braid.autoScrub.interval (any systemd calendar expression – e.g. "Sun *-*-* 02:00:00") to land in an off-peak window. Do not interrupt a scrub in progress.
  • Create subvolumes early – there is no cost to creating them upfront, and you cannot convert a directory to a subvolume later without copying the data.

What’s next

← braid

Auto-unlock

This guide covers setting up unattended unlock with a USB keyfile so the pool comes online automatically at boot.

Read this if you want the NAS to unlock without SSH-ing in to type a passphrase – for example, after a power outage or scheduled reboot.

How it works

By default, braid requires a passphrase to unlock. Auto-unlock adds a binary keyfile stored on a USB drive as a second LUKS unlock method. At boot, braid mounts the USB, reads the keyfile, unlocks all drives, and unmounts the USB.

The passphrase (LUKS slot 0) still works for manual unlock. The keyfile lives in LUKS slot 1.

Boot behavior

With USB key present: NAS boots, braid-auto-unlock.service runs, mounts USB, unlocks pool, unmounts USB. Pool is online by the time you can SSH in.

Without USB key present: The service waits for the USB device (up to timeoutSec, default 5 seconds), then skips gracefully. The pool stays locked. You SSH in and sudo braid unlock with your passphrase as usual.

This means removing the USB key is all it takes to go back to manual unlock.

Step 1: Generate and enroll the keyfile

Plug in a USB drive and find its by-id path:

lsblk -d -o NAME,SIZE,MODEL,ID-LINK

Mount it somewhere temporary:

sudo mount /dev/disk/by-id/usb-SanDisk_Cruzer_XXXX-0:0-part1 /mnt/usb

Generate a random keyfile and enroll it into all pool disks:

sudo braid enroll /mnt/usb --generate

--generate requires /mnt/usb to already be mounted. This creates a 4096-byte random file at /mnt/usb/braid.key and enrolls it into LUKS slot 1 on every disk in the pool. braid asks for your existing passphrase to authorize the enrollment.

Unmount the USB:

sudo umount /mnt/usb

Enroll during braid add

If you are adding a new disk and already have a keyfile on USB, you can enroll in one step:

sudo braid add newdisk=/dev/disk/by-id/ata-NEWDISK_SERIAL --enroll /mnt/usb

The --enroll flag points to the directory containing braid.key. The new disk gets both the passphrase and the keyfile.

Step 2: Enable auto-unlock in NixOS config

Find your USB key’s by-id path (use the raw device, not a partition, if your USB has no partition table):

lsblk -d -o NAME,SIZE,MODEL,ID-LINK

Add to your NixOS configuration:

# configuration.nix
braid = {
  enable = true;

  autoUnlock = {
    enable = true;
    keyDevice = "/dev/disk/by-id/usb-SanDisk_Cruzer_XXXX-0:0-part1";
  };
};

Rebuild:

sudo nixos-rebuild switch

Configuration options

OptionDefaultDescription
autoUnlock.enablefalseEnable USB keyfile auto-unlock
autoUnlock.keyDeviceBlock device for the USB key (must use /dev/disk/by-id/ path)
autoUnlock.timeoutSec5Seconds to wait for USB device before giving up
autoUnlock.allowDegradedfalseMount even if some drives are missing (degraded mode)

keyDevice must be a /dev/disk/by-id/ path. braid rejects /dev/sdX paths because they can change between reboots.

Degraded mode

By default, auto-unlock refuses to mount if any pool drive is missing. This prevents silent operation with zero redundancy.

If you want the pool to come online even with a missing drive (for example, if a drive has failed and you plan to replace it), set:

autoUnlock.allowDegraded = true;

Redundancy is reduced until the drive is replaced and data rebalances.

Step 3: Test it

Reboot the NAS with the USB key plugged in:

sudo reboot

After boot, SSH in and check:

sudo braid status

The pool should be online. Check the journal to confirm auto-unlock ran:

journalctl -u braid-auto-unlock.service

Then test without the USB key: remove it, reboot, and confirm the pool stays locked until you manually unlock.

Security considerations

The keyfile on the USB drive can unlock your pool without a passphrase. Treat it like a physical key:

  • Remove the USB after boot. The auto-unlock service unmounts the USB immediately after reading the keyfile, but physically removing it ensures no one can copy the key from a running system.
  • Store the USB securely. If someone has physical access to both the USB key and the NAS drives, they can decrypt your data.
  • Keep a backup of the keyfile. If you lose the USB key, you still have your passphrase. But if you want another auto-unlock USB, you need to braid enroll again.

LUKS header backups

After enrolling a keyfile, braid modifies the LUKS header on each drive (adding slot 1). braid stores LUKS header backups in /var/lib/braid/luks-headers/ as a transient byproduct.

Copy each .luksheader file to a separate location (external drive, another machine), then delete the local file. braid status warns until the local copies are removed. If a drive’s LUKS header is corrupted, the off-system backup is the only way to recover access to that drive’s data.

What’s next

← braid

Monitoring and alerts

This guide covers how braid monitors disk health and notifies you when something goes wrong.

Read this if you want to understand the alert system, configure notifications, or respond to an alert.

How monitoring works

braid runs a health check every 5 minutes via a systemd timer. The check looks at three things:

  1. btrfs device stats – non-zero error counters (read, write, flush, corruption, generation errors) on any drive.
  2. Missing devices – a drive that should be in the pool but is not present.
  3. SMART alerts – smartd detected a SMART health warning on a drive.

A scrub that discovers unrepairable read, checksum, or generation errors increments the same btrfs device stats, so it follows the same beep and braid status flow as an everyday I/O error.

If any check triggers, braid activates an alert.

What happens on alert

When braid monitor detects an issue (exit code 1), the systemd wrapper starts braid-alert.service, which:

  • Beeps the PC speaker (if enabled) until acknowledged. The cadence starts at 5 seconds and backs off exponentially (5s, 10s, 20s, 40s, …) up to once every 15 minutes, so the early beeps are urgent but an ignored alert doesn’t stay obnoxious.
  • Runs your custom alert command (if configured).

The beeping is intentionally persistent and annoying – you should not be able to ignore a disk problem on a NAS that holds your data.

Alerts are latched

An alert stays active until you acknowledge it with braid ack, even if the triggering condition goes away. This is by design: “a disk had errors” is worth investigating even if the error count stopped growing.

Configuration

Monitoring is on by default when braid.enable = true. Here is the full set of options:

braid = {
  enable = true;

  monitor = {
    enable = true;        # default: true
    interval = "5min";    # default: "5min" (systemd time span)
    beep = true;          # default: true (PC speaker alert)
    alertCommand = null;  # default: null (optional custom command)
  };
};

Options

OptionDefaultDescription
monitor.enabletrueEnable disk health monitoring
monitor.interval"5min"How often to check (systemd time span: "5min", "30s", "1h")
monitor.beeptrueBeep the PC speaker on alert
monitor.alertCommandnullCustom command to run on alert (in addition to beep)

Custom alert commands

Set monitor.alertCommand to run a script when an alert fires. This runs in addition to (not instead of) the beep:

braid.monitor.alertCommand = "/home/user/scripts/send-pushover-alert.sh";

The command runs as root. It should be idempotent – it may fire on every monitor cycle while the alert is active.

Disabling the beep

If you do not have a PC speaker or prefer silent alerts:

braid.monitor.beep = false;

You probably want to set a custom alertCommand if you disable the beep, otherwise alerts are silent and only visible in braid status.

SMART integration

braid automatically configures smartd to monitor all drives. When smartd detects a SMART health issue, it writes a flag file that braid’s monitor picks up on the next cycle.

You do not need to configure smartd yourself – braid sets it up with sensible defaults. The NixOS services.smartd options are still available if you need to customize behavior.

Alert workflow

When the NAS beeps (or your alert command fires):

1. SSH in and check status

ssh user@nas
sudo braid status

The output shows a banner when alerts are active and lists the causes:

  • BtrfsDeviceErrors – a specific drive has non-zero error counters. Could be a bad cable, a dying drive, or a transient issue.
  • MissingDevice – a drive is missing from the pool. Check if a cable came loose or if the drive failed.
  • SmartdAlert – SMART reports a health warning. The drive may be failing.

2. Investigate

For device errors, check if they are growing:

# Wait a few minutes and check again
sudo braid status

Steady error counts after a reboot are often transient (power event, cable issue). Growing counts mean the drive is failing.

For a missing device, check physical connections. If the drive is dead, plan a replacement:

sudo braid replace --old deadname --new newname=/dev/disk/by-id/ata-NEW_SERIAL

3. Acknowledge

Once you have investigated and resolved (or accepted) the issue:

sudo braid ack

This silences the beep and resets the alert baseline. New errors after ack will trigger a fresh alert.

Checking monitor status

View the monitor service logs:

journalctl -u braid-monitor.service --since "1 hour ago"

View the alert service:

journalctl -u braid-alert.service

Check if the monitor timer is active:

systemctl status braid-monitor.timer

How the pieces fit together

braid-monitor.timer (every 5 min)
  -> braid-monitor.service
    -> braid monitor (exit 0 = ok/offline/lock-contended, 1 = alert, 2 = setup error)
      -> on exit 1: start braid-alert.service
        -> beep (PC speaker, 5s -> 10s -> ... -> 15min)
        -> alertCommand (if configured)

smartd (always running)
  -> detects SMART issue
  -> writes /var/lib/braid/smartd-alert flag
  -> starts braid-alert.service
  -> next braid monitor cycle picks up the flag

braid ack
  -> clears alert state
  -> braid-alert.service stops (beeping stops)

What’s next

← braid

Power management

This guide covers auto-suspend, Wake-on-LAN (WoL), and troubleshooting the hardware and software chain that makes it all work.

Read this if you want your NAS to sleep when idle and wake on demand.

How auto-suspend works

When enabled, braid uses autosuspend to suspend the entire NAS to RAM when idle. This stops all drives, the CPU, and fans – the machine draws almost no power. Wake it with a Wake-on-LAN magic packet from any device on the network.

Suspend-to-RAM preserves LUKS keys and the mounted btrfs pool in memory. When the NAS wakes, the pool is immediately available – no re-unlock needed.

What counts as activity

The NAS stays awake while any of these are true:

CheckWhat it detects
braid idlescrub plus any btrfs kernel exclusive operation (balance, device add/remove/replace, resize, swap activate) – the latter via /sys/fs/btrfs/<fsid>/exclusive_operation
braid wol-readyconfigured wired NIC currently reports Wake-on: g; if WoL is disabled or unverifiable, auto-suspend is blocked
SSHActive SSH connections (port 22)
Local sessionsTTY, X11, or Wayland sessions (via logind)
SambaActive SMB clients (auto-detected, only if Samba is enabled)
NFSActive NFS connections on port 2049 (auto-detected, only if NFS server is enabled)

If all checks pass (everything idle) for the configured idle time (default 15 minutes), the NAS suspends.

The WoL check gates braid’s auto-suspend path only. Manual sudo systemctl suspend remains available for maintenance and testing, but it bypasses braid’s pre-suspend WoL check.

Scrub wakeups

The monthly btrfs scrub timer is registered as an autosuspend wakeup source. If the NAS is asleep when a scrub is due, it wakes via RTC alarm, runs the scrub, and suspends again when idle.

Configuration

braid = {
  enable = true;

  autoSuspend = {
    enable = true;
    idleTime = 900;          # seconds before suspend (default: 900 = 15 min)
    wolInterface = "eno1";   # required -- your wired ethernet interface
  };
};

Options

OptionDefaultDescription
autoSuspend.enablefalseEnable auto-suspend when idle
autoSuspend.idleTime900Seconds of idle time before suspending
autoSuspend.wolInterfaceWired ethernet interface for Wake-on-LAN (required)

wolInterface is mandatory. Without WoL, a suspended NAS is unreachable until someone presses the power button. Find your interface with:

ip link

Look for your wired ethernet interface (usually eno1, enp1s0, or similar). WiFi interfaces (wl*) are rejected – WoL requires wired ethernet.

Hardware compatibility

Intel ethernet controllers (e.g., X540, I210, I225) have reliable WoL support with the in-kernel ixgbe and igc drivers. These are the lowest-risk choice for a NAS that needs reliable remote wakeup.

Avoid: Aquantia/Marvell AQC107

The AQC107 (atlantic driver) has known WoL reliability issues on Linux. If WoL is important to you, avoid this chipset.

RTL8125 (Realtek 2.5GbE)

The Realtek RTL8125 works for WoL but requires the vendor r8125 driver instead of the in-kernel r8169. See the troubleshooting section below.

WoL troubleshooting

WoL involves a chain from BIOS to NIC driver to PCI bridge. When it does not work, you need to figure out which link in the chain is broken. Work through these steps in order.

1. Check BIOS settings

WoL must be enabled in your BIOS/UEFI. The exact option names vary by vendor, but look for:

  • Wake on LAN or Wake on PCI/PCIe – enable this.
  • ErP Ready or ErP Lot 6 – disable this. ErP is an EU power-saving regulation that cuts standby power below what the NIC needs to listen for magic packets. If ErP is on, WoL cannot work.
  • Deep Sleep – disable if present. Similar to ErP, this cuts power to PCIe slots during standby.

2. Test basic suspend first

Before debugging WoL, verify the NAS can suspend and wake at all:

# Check available sleep states
cat /sys/power/state

You should see mem in the output. Test a manual suspend and wake with the power button:

sudo systemctl suspend

Press the power button to wake. If this does not work, suspend itself is broken (check ACPI settings in BIOS).

Identify spurious wake sources

Sometimes the NAS wakes immediately after suspend. Check what woke it:

# What woke the system last time?
journalctl -b -k | grep -i "wake"

# List ACPI wake sources
cat /proc/acpi/wakeup

The output looks like:

Device  S-state   Status   Sysfs node
XHC0      S3    *enabled   pci:0000:00:14.0
GLAN      S4    *enabled   pci:0000:00:1f.6
...

Disable wake sources one at a time to find the culprit. Use a binary search – disable half, test, narrow down.

To disable a wake source temporarily (resets on reboot):

echo XHC0 | sudo tee /proc/acpi/wakeup

Once you find the problematic device, disable it permanently via a udev rule in your NixOS config:

services.udev.extraRules = ''
  # Disable USB controller wake (XHC0 causes spurious wakeups)
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x8086", ATTR{device}=="0xa0ed", ATTR{power/wakeup}="disabled"
'';

Find the vendor and device IDs from lspci -nn for the corresponding PCI device.

3. Verify WoL is enabled on the NIC

After rebuild with braid.autoSuspend.wolInterface set, verify with doctor:

sudo braid doctor

Expected row:

[ok]   wake-on-lan     eno1 reports Wake-on: g (magic packet armed)

Wake-on: g means WoL is active (magic packet mode). If doctor reports Wake-on: d (disabled), the NixOS config is not taking effect – check that you rebuilt, that the interface name is correct, and that BIOS/driver WoL settings allow wake.

With autoSuspend enabled, braid also checks this before every automatic suspend. If the NAS is idle but does not sleep, run sudo braid doctor and inspect the wake-on-lan row.

4. Test WoL from another machine

From a different machine on the same network, send a magic packet:

# Install wakeonlan tool (on the sending machine)
# NixOS: nix-shell -p wakeonlan
# macOS: brew install wakeonlan

# Get the NAS MAC address (on the NAS, before suspending)
ip link show eno1
# look for "link/ether xx:xx:xx:xx:xx:xx"

# Suspend the NAS
ssh user@nas sudo systemctl suspend

# Send the magic packet (from the other machine)
wakeonlan xx:xx:xx:xx:xx:xx

If the NAS wakes, WoL is working. If not, continue to the next steps.

5. NIC driver issues

RTL8125 (Realtek 2.5GbE)

The in-kernel r8169 driver handles the RTL8125 but has unreliable WoL. The vendor r8125 driver fixes this.

Add to your NixOS config:

boot.extraModulePackages = with config.boot.kernelPackages; [ r8125 ];
boot.blacklistedKernelModules = [ "r8169" ];

After rebuild, verify the driver:

ethtool -i eno1 | grep driver
# should show: driver: r8125

6. PCI bridge wakeup (PME propagation)

Even with WoL enabled on the NIC, the NIC’s wake signal (PME – Power Management Event) must propagate through the PCI bridge to reach the CPU. Some BIOS implementations do not enable PME on intermediate bridges.

Check if PME is enabled on the bridge:

# Find the NIC's PCI address
lspci | grep -i ethernet
# e.g., 05:00.0 Ethernet controller: Intel ...

# Find its parent bridge
lspci -t
# Look for the tree path to your NIC

# Check PME on the bridge
sudo lspci -vvs 00:1c.0 | grep -i pme
# Look for "PME-Enable+" (good) or "PME-Enable-" (bad)

If PME is disabled on the bridge, enable it with a udev rule:

services.udev.extraRules = ''
  # Enable PME on PCI bridge for NIC WoL
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x8086", ATTR{device}=="0x7ab8", RUN+="${pkgs.pciutils}/bin/setpci -s %k CAP_PM+04.W=0100:0100"
'';

The setpci command sets the PME_En bit in the PCI PM capability. Replace the vendor/device IDs with those from your bridge:

sudo lspci -nn -s 00:1c.0
# e.g., 00:1c.0 PCI bridge [0604]: Intel Corporation ... [8086:7ab8]

Finding the right bridge

If lspci -t is hard to read, use this to trace the full path from NIC to root:

# Starting from the NIC PCI address (e.g., 05:00.0)
cd /sys/bus/pci/devices/0000:05:00.0
ls -la ..  # parent bridge
# Follow symlinks up until you reach the root bridge

Each bridge in the chain must have PME enabled for WoL to work. In practice, it is usually only one bridge between the NIC and the root that needs fixing.

7. Still not working

If WoL still fails after all the above:

  1. Check dmesg on wake – after waking with the power button, look for clues:

    dmesg | grep -i -E "wake|suspend|pme|wol"
    
  2. Try a different NIC – if your motherboard has multiple ethernet ports, try WoL on each one. Onboard Intel NICs are most reliable.

  3. Test with a minimal NixOS config – remove all non-essential services and test WoL in isolation. If it works minimal but not full, bisect your config.

  4. Check the NIC firmware – some NICs need firmware loaded at boot for WoL. Check dmesg | grep firmware for errors.

What’s next

← braid

Fan control

This guide covers how to drive chassis fans from HDD temperatures on a NixOS NAS using the braid.fanControl module.

Read this if you want quieter idle and predictable ramp under sustained disk load – BIOS fan curves cannot see HDD temperatures, only CPU and motherboard temperatures.

Why HDD-driven fan control

HDD longevity drops as drives run hotter, so the goal is to keep them under a target temperature – a widely used rule of thumb is ~40 C. The catch is that the BIOS fan curve can’t see drive temperature; it reads only CPU package temp and a motherboard sensor. So no matter how the BIOS ramps the chassis fans, nothing in that loop is actually watching the drives. The BIOS already protects the CPU regardless of its TDP – the drives are the part left unmonitored.

The fix is to move fan control into Linux userspace using drive temps as the signal. The kernel’s drivetemp module exposes each SATA drive’s SMART temperature as a standard hwmon input, and hddfancontrol reads those inputs and drives the chassis fan’s PWM proportionally to the hottest drive.

braid.fanControl wraps hddfancontrol so you only provide two hardware-specific values (the Super I/O platform device name and PWM channel number from pwmconfig) plus two calibration values (pwm.minStart/pwm.maxStop from hddfancontrol pwm-test). The module handles the systemd service, drivetemp loading, SATA hotswap udev rules, and crash recovery.

Scope: braid.fanControl monitors all visible SATA devices, not only braid pool members. Drives generate heat regardless of LUKS state, pool membership, or mount status – binding fan control to pool state would leave warm disks uncooled when the pool is locked or before first unlock. SAS drives are out of scope.

The stack

LayerRole
drivetemp (kernel)Exposes each SATA drive’s SMART temp as an hwmon input
Super I/O driver (kernel)Board-specific (nct6775, f71882fg, it87, …) – drives the chassis fan PWM headers
lm_sensors (userspace)Provides sensors, sensors-detect, pwmconfig for discovery
hddfancontrol (userspace)Reads drivetemp hwmon inputs for all SATA drives, ramps PWM from the hottest
braid.fanControl (NixOS)Runs hddfancontrol as a systemd service, handles SATA hotswap and crash recovery

Setup has two phases: interactive discovery on the running machine (one-time), then committing the result to Nix.

Prerequisites

  • BIOS: put chassis fan headers into software/manual control, and match the header mode to the fan type – PWM for 4-pin fans, DC (voltage) for 3-pin fans. Getting this wrong leaves the fan either stuck at a fixed speed or uncontrollable from userspace. If unsure, pwmconfig’s spin-down test (below) will tell you: a fan on the wrong header mode will not ramp down.
  • Leave the CPU fan header on BIOS auto. Don’t fight the board’s package thermal logic with userspace – the BIOS is better at protecting the CPU than you are.

Discovery

Discovery is a one-time interactive procedure. Its only output is four values you paste into Nix at the end:

  • pwm.platformDevice – platform device name of the Super I/O chip (e.g. f71882fg.656)
  • pwm.number – PWM channel number on that chip (e.g. 2 for pwm2)
  • pwm.minStart – PWM value needed to start the fan from standstill
  • pwm.maxStop – PWM value below which the spinning fan stalls

Install the tooling and load the sensor modules

braid.fanControl loads drivetemp automatically, but the interactive operator tools (sensors, sensors-detect, pwmconfig, hddfancontrol) are only needed on your PATH during discovery. Add them temporarily, plus your board’s Super I/O driver – these can stay in the committed config so future re-runs after drive swaps or chassis changes have the same tools available:

{ pkgs, ... }:
{
  environment.systemPackages = [ pkgs.lm_sensors pkgs.hddfancontrol ];
  boot.kernelModules = [ "coretemp" ];  # drivetemp added by braid.fanControl
}

Rebuild, then confirm you see per-drive temps:

sensors | grep -A1 drivetemp

You should see one drivetemp-scsi-*-0 block per SATA drive, each showing a current temp1 reading. drivetemp must be loaded before you run pwmconfig, or drive temps will not appear as eligible fan inputs.

Find your Super I/O chip

Run sudo sensors-detect and accept the defaults. When it asks whether to write /etc/modules-load.d/lm_sensors.conf, answer no – on NixOS, kernel modules are declared in boot.kernelModules, not in /etc.

At the end sensors-detect prints a summary. For most boards it names a driver (nct6775, it87, …); add that driver to boot.kernelModules alongside coretemp, rebuild, and confirm a new block appears in sensors showing fan RPMs and PWMs.

If the summary says Found unknown chip with ID 0xXXXX, sensors-detect’s chip-ID table has fallen behind the kernel. The kernel driver may already support your chip even though the detect script doesn’t recognize it. Grep the ID in the kernel source to find the driver:

# on github, search drivers/hwmon/*.c in torvalds/linux for the ID
# e.g., 0x1502 turns up in drivers/hwmon/f71882fg.c, so the module is f71882fg

Add the module you found to boot.kernelModules. If modinfo <module> works and sensors still shows no new block after rebuild, move on to the next section.

“Device or resource busy” on module load

If dmesg shows your Super I/O driver correctly identifying the chip but modprobe fails with Device or resource busy, ACPI has reserved the hwmon I/O region. The fix is a kernel parameter:

boot.kernelParams = [ "acpi_enforce_resources=lax" ];

This requires a full reboot – kernel command line changes don’t apply on nixos-rebuild switch alone. After the reboot, sensors should show a block for your Super I/O chip with fan RPMs and PWMs.

Map fans to PWMs with pwmconfig

pwmconfig identifies which PWM controls which fan by briefly stopping each fan in turn. Run it when drives are idle (not mid-scrub or rebuild) – a stalled fan during sustained write load is a bad place to be.

Before starting, record each PWM’s current enable value. pwmconfig flips them to manual (1) to run its spin-down test, and the meaning of other values is driver-specific (e.g. f71882fg uses 0=off / 1=manual / 2=auto; other drivers differ). Restoring the original is safer than hard-coding a mode:

for p in /sys/class/hwmon/*/device/pwm[0-9]_enable; do
  printf '%s = %s\n' "$p" "$(cat "$p")"
done

Save that output somewhere you can read after pwmconfig exits. Then run:

sudo pwmconfig

It walks each PWM, asks whether to switch it to manual (say yes so the spin-down test can run), then stops each fan briefly and asks which fanN_input reading dropped. Answer based on what you observe in the tool’s output.

After identification, it asks which fans to configure. Pick only the chassis fans. Skip the CPU PWM – leave it BIOS-controlled. Also skip any PWM whose fan did not respond (unpopulated header, or fan/header mode mismatch in BIOS).

pwmconfig writes an /etc/fancontrol file at the end. You won’t use that file (braid uses hddfancontrol, not vanilla fancontrol), but the tool’s spin-down output is still how you identify the PWM path and measure stall behavior. Record the PWM sysfs path for the chassis fan – something like /sys/devices/platform/<super-io>/hwmon/hwmonN/device/pwmN.

Translate the PWM path to a platform device

braid.fanControl takes the stable platform device name plus the PWM channel number, and resolves the (unstable) hwmonN segment at service start. Translate the pwmconfig-surfaced sysfs path with:

pwm=/sys/class/hwmon/hwmon4/device/pwm2  # from pwmconfig output
pwm_dir=$(dirname "$pwm")
if [ "$(basename "$pwm_dir")" != device ]; then
  pwm_dir="$pwm_dir/device"
fi
basename "$(readlink -f "$pwm_dir")"
# -> f71882fg.656

The if branch handles both sysfs layouts: hwmon*/device/pwmN (common on f71882fg, nct6775) and hwmon*/pwmN (fallback). Without it, the fallback layout resolves to hwmon4 instead of the platform device.

The PWM number is the numeric suffix on the pwmN filename (2 in the example above).

After pwmconfig exits, restore each skipped PWM to the value you recorded:

echo <original> | sudo tee /sys/class/hwmon/<N>/device/pwmK_enable

Measure minStart and maxStop with hddfancontrol pwm-test

hddfancontrol pwm-test ramps the PWM up and down while measuring fan RPM. It finds:

  • pwm.minStart – the PWM at which a stopped fan begins spinning again
  • pwm.maxStop – the highest PWM at which a spinning fan stalls

Run it against the chassis PWM path from the previous step:

sudo hddfancontrol pwm-test -p /sys/devices/platform/.../pwmN

It takes a couple of minutes (ramps slowly to avoid bouncing the fan). Record the final minStart and maxStop values it prints.

If the fan has a hardware RPM floor (common on voltage-controlled 3-pin fans, and some boards’ chassis headers even in PWM mode), pwm.maxStop will be 0 and pwm.minStart will be some low value – the fan never actually stops. That’s fine; hddfancontrol still handles the ramp correctly. The --min-fan-speed-prct floor in braid.fanControl prevents the daemon from commanding the fan off in any case.

Committing to Nix

Fan control is a braid sub-feature: it activates only when braid.enable = true (see Getting started). The recipes below show the full braid block; merge the non-braid lines (boot.*, environment.systemPackages) into your existing config.

Minimal recipe

Paste the four discovery values into braid.fanControl:

{ pkgs, ... }:
{
  environment.systemPackages = [ pkgs.lm_sensors ];   # optional: tools for re-running discovery
  boot.kernelModules = [ "coretemp" "nct6775" ];      # your Super I/O driver here
  # boot.kernelParams = [ "acpi_enforce_resources=lax" ];  # only if needed

  braid = {
    enable = true;            # fan control only runs when the braid module is enabled

    fanControl = {
      enable = true;
      pwm = {
        platformDevice = "nct6775.656";
        number = 2;
        minStart = 65;   # from hddfancontrol pwm-test
        maxStop  = 60;   # from hddfancontrol pwm-test
      };
    };
  };
}

The module resolves the PWM sysfs path at service start by globbing /sys/devices/platform/<platformDevice>/hwmon/hwmon*/{device/,}pwm<number>, which handles hwmonN renumbering across reboots. The platform device name (nct6775.656, f71882fg.656, etc.) is stable.

Sane defaults for the rest:

  • minTemp = 30 / maxTemp = 40 – fan floors below 30 C, ramps to full at 40 C
  • minFanSpeedPercent = 20 – fan never drops below 20% of range (conservative; upstream hddfancontrol default)
  • interval = "30s" – 30-second polling interval

Override any of these in the braid.fanControl block if you want a different curve. See NixOS configuration for the full option table.

Tuning the curve

  • Ramp starts too soon / fan audibly spools early on idle: raise minTemp (try 32-34).
  • Drives climbing past 42-44 C under sustained load: lower maxTemp (try 38) or raise minFanSpeedPercent (try 30).
  • Fan noticeably oscillating: raise interval (e.g. "60s"). HDDs heat slowly, so aggressive polling only adds jitter.

Additional sensor modules

For ECC DIMM temp monitoring (visible in sensors; not used by hddfancontrol directly):

boot.kernelModules = [ "coretemp" "nct6775" "jc42" ];

Verification

Watch both the drivetemp input and the PWM/RPM, not RPM alone. CPU heat or ambient temperature can produce a false-positive fan ramp if you’re eyeballing only RPM.

The self-contained recipe for a braid NAS (btrfs is already assumed): run a scrub as the heat source. It reads every extent on every drive, which is representative NAS load and needs no pre-staged payload. The example below uses /mnt/storage as a concrete mount point – substitute your own pool mount:

# pane 1: start the scrub
sudo btrfs scrub start /mnt/storage

# pane 2: watch the thermal signals
watch -n2 sensors

# pane 3: follow the hddfancontrol daemon log
journalctl -u hddfancontrol-braid -f

Expected: drive temps climb 3-8 C over 10+ minutes (HDDs heat slowly), and the PWM tracks in step per your minTemp/maxTemp curve. The daemon log prints temperature readings and speed changes as it polls.

Cancel anytime with:

sudo btrfs scrub cancel /mnt/storage

If drive temps climb but PWM doesn’t move, double-check the resolved PWM file is writable (ls -l /sys/devices/platform/<platformDevice>/hwmon/hwmon*/{device/,}pwm<number>) and that hddfancontrol-braid is running (systemctl status hddfancontrol-braid).

Monitoring commands

Quick reference for monitoring the fan control loop on a running system. These paths assume an f71882fg-family Super I/O; substitute your platform device if different.

# Live chassis fan RPM + drive temps
watch -n2 'cat /sys/devices/platform/f71882fg.656/fan2_input; sensors drivetemp-*'

# Follow daemon log (temp readings, speed changes)
journalctl -u hddfancontrol-braid -f

# Current PWM value (0-255)
cat /sys/devices/platform/f71882fg.656/pwm2

# All fan channels at a glance (RPM, PWM, control mode)
for i in 1 2 3; do echo "fan${i}: $(cat /sys/devices/platform/f71882fg.656/fan${i}_input) RPM, pwm${i}: $(cat /sys/devices/platform/f71882fg.656/pwm${i}), enable: $(cat /sys/devices/platform/f71882fg.656/pwm${i}_enable)"; done

# Service status
systemctl status hddfancontrol-braid

# All hwmon sensors (CPU, board, DIMM, drives)
sensors

# SMART details for a specific drive
sudo smartctl -a /dev/sda

The pwmN_enable values: 0=off, 1=manual (hddfancontrol sets this), 2=BIOS auto. hddfancontrol is configured with --restore-fan-settings, so a clean service stop restores the original enable mode.

TUI fans panel

braid tui’s Data tab gains a Fans row when fan control is enabled. The section title shows daemon: status for hddfancontrol-braid.service; the row shows current PWM/RPM, the Driving column names the hottest drive setting the curve, and the Curve column shows the configured temperature-to-speed range. The panel polls every 5 seconds. Press r to refresh both pool and fan probes immediately.

When braid.fanControl isn’t enough

braid.fanControl drives a single chassis PWM from the hottest SATA drive. That covers the common NAS case. If you need more control – multiple PWMs with different curves, PID-based responsiveness, non-SATA drive temperature sources – the usual escape hatches:

  • Configure services.hddfancontrol directly (nixpkgs’s module supports multiple daemons, per-fan config).
  • fan2go – Go daemon; supports multiple sensors and PID curves.
  • CoolerControl – more featureful, GUI-oriented.

Disable braid.fanControl.enable and bring your own solution. The drivetemp kernel module that braid loads is the only piece you’d need to keep.

Worked example: ASRock Industrial IMB-X1231

A concrete walk-through of the discovery phase on one board, to show what the unknown-chip and ACPI-busy paths look like in practice.

Hardware

  • Board: ASRock Industrial IMB-X1231 (mini-ITX, 12th/13th gen Intel)
  • CPU: Intel i3-14100T (35W TDP)
  • Memory: ECC SODIMM with a jc42-compatible thermal sensor on SMBus
  • Chassis fan: single 120mm rear, voltage-controlled (not 4-pin PWM)

sensors-detect reported an unknown chip

Probing for Super-I/O at 0x2e/0x2f
Trying family `VIA/Winbond/Nuvoton/Fintek'...               Yes
Found unknown chip with ID 0x1502
    (logical device 4 has address 0x290, could be sensors)
...
Probing for `National Semiconductor LM78' at 0x290...       Success!
    (confidence 6, driver `lm78')

The lm78 hit is a false positive: it’s a 1995 chip whose ISA probe signature collides with anything living at 0x290. The real chip is whatever has devid 0x1502. Ignore the lm78 recommendation.

Finding the right driver in kernel source

A grep of drivers/hwmon/ in torvalds/linux for 0x1502 pointed at f71882fg.c:

#define SIO_F81866_ID    0x1010
#define SIO_F81966_ID    0x1502
/* ... */
case SIO_F81866_ID:
case SIO_F81966_ID:

So: Fintek F81966, register-compatible with the F81866, driven by the f71882fg module – which has supported this ID since kernel 5.16. lm_sensors 3.6.2’s chip-ID table just hadn’t caught up.

ACPI held the hwmon I/O region

After adding f71882fg to boot.kernelModules and rebuilding, the driver identified the chip in dmesg:

f71882fg: Found f81866a chip at 0x290, revision 48, devid: 1502

But modprobe failed with Device or resource busy. Added boot.kernelParams = [ "acpi_enforce_resources=lax" ], rebooted, and sensors showed the full Super I/O block:

f81866a-isa-0290
  fan1: 1489 RPM       <- rear chassis, voltage-controlled
  fan2: 1573 RPM       <- CPU cooler
  fan3:    0 RPM       <- unpopulated header
  pwm1: 58%  pwm2: 58%  pwm3: 72%
  temp1: 36.0 C  temp2: 20.0 C  temp3: 37.0 C

pwmconfig mapped the fans

The spin-down test confirmed:

  • pwm2 -> fan2 (chassis fan; voltage-controlled – RPM floors at ~395 and never fully stops regardless of PWM). Selected for braid.fanControl.
  • pwm1 -> fan1 (4-pin PWM CPU fan; stopped cleanly at PWM=60). Skipped – left on BIOS auto.
  • pwm3 -> no fan (unpopulated header). Also left on auto.

hddfancontrol pwm-test

sudo hddfancontrol pwm-test -p /sys/devices/platform/f71882fg.656/hwmon/hwmon4/device/pwm2
...
minStart: 65
maxStop:  60

Final Nix config

boot.kernelModules = [ "coretemp" "f71882fg" "jc42" ];
boot.kernelParams  = [ "acpi_enforce_resources=lax" ];

braid = {
  enable = true;

  fanControl = {
    enable = true;
    pwm = {
      platformDevice = "f71882fg.656";
      number = 2;
      minStart = 65;
      maxStop  = 60;
    };
  };
};

End-to-end check

With drive temp at 32 C and the default curve (minTemp=30, maxTemp=40, minFanSpeedPercent=20): expected pwm2 = 20% + (32 - 30) / (40 - 30) * 80% of PWM range above maxStop. Observed pwm2 climbed smoothly with drive temp under a scrub, RPM tracked the pwmconfig correlation table. Control loop arithmetically correct.

What’s next

← braid

UPS

This guide covers enabling UPS (uninterruptible power supply) support on a braid NAS via NUT (Network UPS Tools).

Enabling UPS support (braid.enable = true plus braid.ups.enable = true) turns on three behaviors:

  • Orderly poweroff on low battery. When the UPS reports critical, NUT’s upsmon invokes systemctl poweroff. systemd unwinds braid-online.service’s ExecStop, which runs braid lock and cleanly unmounts the pool before the battery exhausts.
  • Preflight refusal without verified utility power. braid add / remove / remove-missing / replace check UPS state at startup and refuse to begin a pool mutation unless the UPS reports verified utility power (OL). This narrows the surface that journal recovery needs to cover.
  • Live state visibility. braid ups status (and the TUI Data tab) show the parsed upsc output: status flags, battery charge, runtime remaining, load, estimated watts, input voltage, and device info.

Scope

v1 supports a single USB-connected UPS on the NAS, monitored by the NAS itself (single-host standalone). Non-USB drivers work through the escape hatch but are not tested.

Minimal config

# configuration.nix
{
  braid = {
    enable = true;
    ups.enable = true;
  };
}

Defaults: name = "ups", driver = "usbhid-ups", port = "auto". Rebuild and plug the UPS’s USB cable in; NUT’s auto-detect finds the device.

Override the driver or port for non-USB UPSes:

braid = {
  enable = true;

  ups = {
    enable = true;
    name = "myups";
    driver = "apcsmart";
    port = "/dev/ttyS0";
  };
};

Checking status

# Curated human summary
sudo braid ups status

# Machine-readable JSON (stable shape for scripts)
sudo braid ups status --json | jq .

Example human output:

UPS: ups
Status: OL
Battery: 100%
Runtime: 30m 0s
Load: 17% (56 W estimated)
Input: 120.0 V (transfer 88-142 V)
Device: APC Back-UPS ES 550G
Battery manufactured: 2023/04/12
Last test: Done and passed

The watts line is labeled estimated and omitted entirely if the UPS does not report both ups.load and ups.realpower.nominal.

The --json output serializes the full parsed model. Distinct error sentinels are emitted for the common non-OK cases:

ConditionJSON shapeExit code
UPS reachable with populated ups.statusserialized UpscOutput0
UPS reachable but ups.status emptyserialized UpscOutput plus "warning": "ups_status_empty"0
UPS query failed{"error": "query_failed", "detail": "exit <code>: <stderr>"}1
UPS invocation failed (upsc could not run – missing on PATH, killed by signal, or other runner-level failure){"error": "invocation_failed", "detail": "command failed: upsc ups: <reason>"}1
UPS not enabled{"error": "ups_not_enabled"}0

TUI UPS panel

braid tui’s Data tab gains a UPS row when UPS support is enabled. Status text is color-coded by severity:

  • Green – OL (on utility power).
  • Yellow – OB (on battery, not yet critical).
  • Red – LB / TESTFAIL / COMMBAD / FSD (critical; shutdown imminent or comms-loss).
  • DarkGray – UPS query failed, or no UPS state available yet.

The panel polls on the same 5-second cadence as the TUI fans panel. Press r to refresh both pool and UPS probes immediately.

What happens on low battery

  1. upsmon sees ups.status: OB LB, declares the UPS critical.
  2. upsmon runs its configured SHUTDOWNCMD. braid overrides the nixpkgs default (shutdown now) with systemctl poweroff.
  3. systemd walks its shutdown sequence. braid-online.service stops with ExecStop running braid lock, which unmounts the pool and closes LUKS.
  4. The host powers off before the battery exhausts.

Under the default upsmon timings (POLLFREQ = POLLFREQALERT = 5, FINALDELAY = 5) the window between LB detection and poweroff is ~10 seconds, plus however long braid lock takes. The default battery.runtime.low in most UPS drivers is around 120 seconds, which is enough headroom for a single-disk pool’s clean teardown. Larger pools may need a wider battery.runtime.low (set at the NUT level, not through braid).

Mutation refusal when utility power is not verified

With UPS enabled, braid add / remove / remove-missing / replace refuse to start unless upsc returns a non-empty status set that contains OL and no known blocker. The refusal cases are:

  • on-battery (OB)
  • a critical flag the TUI paints red (LB / TESTFAIL / COMMBAD / FSD)
  • OL missing from an otherwise non-blocking status set
  • upsc query or invocation failure (stopped daemon, unknown UPS name, or another fatal NUT error – the message includes upsc’s stderr when it exits non-zero)
  • an empty or missing ups.status

Known non-critical advisory states such as OL RB, and unknown tokens co-present with OL and no known blocker, still pass: the OL flag is the affirmative utility-power proof, not a guarantee of full battery health. Example refusal:

$ sudo braid add newdisk=/dev/disk/by-id/ata-TOSHIBA_NEW
error: cannot verify UPS is on utility power (UPS reports on-battery)
-- refusing to start add. Check 'braid ups status', restore utility
power, then retry.

Recovery: run braid ups status to confirm, fix the UPS/NUT state, restore utility power, wait for the status to return to a trusted OL, and retry the command.

Two clarifications:

  • braid doctor’s ups_daemon: ok means the configured NUT daemon is reachable; it is not a guarantee that mutating-command preflight will pass. The refusal error from add / remove / remove-missing / replace is the primary channel for the exact mutation-readiness blocker.
  • The OL gate assumes the configured NUT driver reports OL on utility power as documented by NUT. If a device or driver violates that contract, inspect with braid ups status; the recovery is to fix the NUT driver/config or disable braid.ups until the UPS state can be trusted.

doctor checks

braid doctor adds two UPS-adjacent checks when UPS support is enabled:

  • ups daemon – fails if upsc is missing or cannot be spawned, because braid cannot verify the enabled UPS shutdown path. It warns if upsc runs but cannot reach the daemon or exits non-zero. Fix missing upsc by checking the braid wrapper/NUT package path; fix daemon reachability with systemctl status upsd.service.
  • braid-online – fails (high severity) if the pool is mounted but braid-online.service is not active. Without that service active, the UPS shutdown path does not unmount the pool, and the safety guarantee silently breaks. Fix by running systemctl start braid-online.service or braid unlock.

Both checks skip when UPS support is disabled, and the braid-online check additionally skips when the pool is not mounted (there is nothing for the UPS shutdown path to unmount).

v1 limitation: no async alert

There is no asynchronous notification when the UPS goes on battery or loses comms in v1. Operators who are not actively watching braid ups status or the TUI will not see those conditions – only the orderly shutdown on LB is automatic.

This is deliberate: integrating UPS events into braid’s shared alert model requires splitting AlertCause by persistence semantics (latched-until-ack for disk errors, active-while-condition-holds for live UPS states) and is out of scope for v1. See decisions/020-ups-integration.md for the open-question status.

If you need asynchronous UPS notifications today, wire NUT’s NOTIFYCMD directly – braid does not touch it.

← braid

NixOS configuration

Complete reference for the braid NixOS module options. Read this when setting up braid for the first time or tuning behavior after initial setup.

Minimal config

# flake.nix
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
    braid.url = "github:danneu/braid?ref=release";
  };

  outputs = { nixpkgs, braid, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        braid.nixosModules.default
        ./configuration.nix
      ];
    };
  };
}

?ref=release pins braid to its release channel: a moving branch the release fast-forwards to each tag. nix flake update braid is then the “upgrade to the newest release” button, and flake.lock still pins the exact rev. The snippet deliberately omits braid.inputs.nixpkgs.follows – see Binary cache and Tool overrides for why no-follows is the default.

The braid.url input takes any of:

# flake.nix
braid.url = "github:danneu/braid?ref=release";   # newest release (default)
braid.url = "github:danneu/braid?ref=v0.0.1";    # pin a tag
braid.url = "github:danneu/braid?rev=<commit>";  # pin an exact commit
# configuration.nix
braid = {
  enable = true;
};

nixosModules.default supplies braid.package automatically. Override it only to build the CLI yourself.

Binary cache

braid publishes a prebuilt x86_64-linux CLI to a public Cachix cache on every release. Add it so the NAS pulls the binary instead of recompiling Rust:

# configuration.nix
nix.settings = {
  extra-substituters = [ "https://braid.cachix.org" ];
  extra-trusted-public-keys = [ "braid.cachix.org-1:I/p7fx1z5n0+O80KzMuT7aXRdkVyHr/buZKaBu7HvJs=" ];
};

The cache only hits when braid resolves to the exact store path CI built – that is, with the recommended no-follows setup above. Setting braid.inputs.nixpkgs.follows rebuilds braid against your nixpkgs, producing a different path and a cache miss. See Toolchain pinning and ADR 029.

What you get for free

When braid.enable = true, the module sets up:

  • Monthly btrfs scrub – timer + service tied to pool lifecycle. Configurable via braid.autoScrub.
  • Resilient boot – a dead drive never blocks boot. LUKS open and btrfs mount are deferred to braid unlock, not wired into boot.initrd.
  • Pinned toolchain – btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, and ethtool are pinned to NixOS stable versions. Override with braid.packages.* if needed.
  • Shell completions – bash, zsh, and fish completions registered automatically via clap_complete.
  • smartd integrationservices.smartd enabled by default with a braid-owned alert script. SMART failures trigger the braid alert service.
  • Storage group – a storage group is created; mount point is set to root:storage 2770 after unlock. See Sharing and permissions.
  • Disk health monitoring – polls btrfs device stats every 5 minutes, audible beep on errors. Configurable via braid.monitor.
  • Fan control (opt-in) – drive chassis fans from the hottest SATA drive temp. Handles hddfancontrol, SATA hotswap restart, crash recovery. Configurable via braid.fanControl.

Module options

Core

OptionTypeDefaultDescription
braid.enableboolfalseEnable the braid module
braid.packagepackage or nullnullThe braid CLI package; nixosModules.default defaults it to braid-cli-unwrapped
braid.mountPointpath/mnt/storageWhere to mount the btrfs pool
braid.poolAccessGroupstring or null"storage"Group for mount point access. null to disable
braid.lockSystemdStopDeadlineSecspositive int270Seconds to wait for the pool lock during braid-online.service ExecStop; must stay below the unit’s TimeoutStopSec

Tool overrides

OptionTypeDefaultDescription
braid.packages.cryptsetuppackagepkgs.cryptsetupcryptsetup package
braid.packages.btrfsProgspackagepkgs.btrfs-progsbtrfs-progs package
braid.packages.utilLinuxpackagepkgs.util-linuxutil-linux package
braid.packages.nutpackagepkgs.nutNUT package
braid.packages.smartmontoolspackagepkgs.smartmontoolssmartmontools package
braid.packages.ethtoolpackagepkgs.ethtoolethtool package

Override these only if you need a specific version for compatibility testing. The recommended setup omits braid.inputs.nixpkgs.follows (the Minimal config example above), so nixosModules.default sources these defaults from braid’s own pinned nixpkgs (nixos-26.05) – the same versions the release binary cache is built against, so braid is a cache hit. Adding braid.inputs.nixpkgs.follows = "nixpkgs" is an advanced opt-out: it dedups your closure but rebuilds braid against your nixpkgs (a cache miss) and sources these tools from your nixpkgs instead. If you take it, keep your nixpkgs on the same NixOS stable release braid targets so the parsed tool output stays compatible – see Toolchain pinning.

Auto-scrub

OptionTypeDefaultDescription
braid.autoScrub.enablebooltrueEnable periodic btrfs scrub
braid.autoScrub.intervalstring"monthly"systemd calendar expression

The scrub timer is lifecycle-aware: it starts when the pool comes online and stops when the pool goes offline. Persistent = true ensures a missed scrub runs on next unlock (e.g. the pool was locked over a monthly boundary).

braid’s scrub conflicts with the NixOS built-in services.btrfs.autoScrub. If both are enabled, evaluation fails with a clear error. Disable one or the other.

Monitoring

OptionTypeDefaultDescription
braid.monitor.enablebooltrueEnable disk health monitoring
braid.monitor.intervalstring"5min"Polling interval (systemd time span)
braid.monitor.beepbooltrueAudible PC speaker beep on alert
braid.monitor.alertCommandstring or nullnullCustom command to run on alert

When beep = true, the module unblacklists the pcspkr kernel module, creates a beep group, and sets up a udev rule for PC speaker access. The beep loops with exponential backoff (5s, 10s, 20s, 40s, …) capped at once every 15 minutes, until acknowledged with braid ack.

alertCommand runs in addition to the beep (not instead of). Use it for push notifications, email, etc.:

braid.monitor.alertCommand = "curl -s -d 'Disk error on NAS' https://ntfy.sh/my-nas-alerts";

See Monitoring and alerts for the full workflow.

Auto-unlock

OptionTypeDefaultDescription
braid.autoUnlock.enableboolfalseEnable USB keyfile auto-unlock
braid.autoUnlock.keyDevicestring""Block device path (/dev/disk/by-id/...)
braid.autoUnlock.timeoutSecpositive int5Seconds to wait for USB device
braid.autoUnlock.allowDegradedboolfalseMount with missing devices

keyDevice must use a /dev/disk/by-id/ path – /dev/sdX names shift when devices are added or removed.

The auto-unlock service mounts the USB read-only, reads braid.key, unlocks the pool, and unmounts the USB immediately. The keyfile is never left accessible. If the USB is absent at boot, the service exits cleanly without blocking boot.

See Auto-unlock for the enrollment and setup workflow.

Auto-suspend

OptionTypeDefaultDescription
braid.autoSuspend.enableboolfalseSuspend NAS when idle
braid.autoSuspend.wolInterfacestring or nullnullNetwork interface for Wake-on-LAN (required)
braid.autoSuspend.idleTimepositive int900Seconds idle before suspend

Requires a wired ethernet interface – WiFi interfaces are rejected at evaluation time (WoL needs ethtool, which does not work for WiFi).

Activity checks that block suspend:

  • braid idle – scrub or any btrfs kernel exclusive operation (balance, device add/remove/replace, resize, swap activate)
  • Active SSH sessions
  • Active local sessions (TTY/X11/Wayland)
  • SMB connections (auto-detected if services.samba is enabled)
  • NFS connections (auto-detected if services.nfs.server is enabled)

The scrub timer is registered as a wakeup source so the NAS wakes for scheduled scrubs.

See Power management for the full workflow.

Fan control

OptionTypeDefaultDescription
braid.fanControl.enableboolfalseDrive chassis fans from HDD temps
braid.fanControl.pwm.platformDevicestring(required)Platform device name under /sys/devices/platform/
braid.fanControl.pwm.numberint(required)PWM channel number (1-based)
braid.fanControl.pwm.minStartint(required)Minimum PWM to start fan from standstill
braid.fanControl.pwm.maxStopint(required)PWM below which a spinning fan stalls
braid.fanControl.minTempint30Temperature (C) at which fan runs at minimum speed
braid.fanControl.maxTempint40Temperature (C) at which fan runs at full speed
braid.fanControl.minFanSpeedPercentint20Minimum fan speed % (0 = fan may stop)
braid.fanControl.intervalstring"30s"Temperature polling interval

pwm.platformDevice and pwm.number are found via pwmconfig. pwm.minStart and pwm.maxStop are measured with hddfancontrol pwm-test -p <pwm-path>. All four are hardware-specific.

Monitors all visible SATA devices (not only braid pool members). Requires a board-specific Super I/O driver in boot.kernelModules – see Fan control for the hardware discovery workflow.

UPS

OptionTypeDefaultDescription
braid.ups.enableboolfalseEnable UPS support via NUT (single-host standalone)
braid.ups.namestring"ups"UPS identifier for upsd/upsc
braid.ups.driverstring"usbhid-ups"NUT driver; the USB default covers most home-NAS UPSes
braid.ups.portstring"auto"Driver port; auto finds the first matching USB UPS

When enabled, NUT triggers an orderly poweroff on low battery (unwinding braid-online.service -> braid lock -> unmount) and pool-mutating commands (add/remove/remove-missing/replace) refuse to start unless the UPS reports verified utility power (OL). Only name is written to /etc/braid/config.json, so braid ups status and the TUI know which UPS to query; driver and port configure the NUT driver only. Non-USB drivers (apcsmart, snmp-ups) are an escape hatch and not first-class.

See UPS for the setup workflow and live status.

Full config example

Every option with its default (or a representative value for required/optional fields):

braid = {
  enable = true;
  # package -- defaults to nixosModules.default's pinned braid-cli-unwrapped;
  # set only to build the CLI yourself.
  mountPoint = "/mnt/storage";   # default
  poolAccessGroup = "storage";   # default; null to disable
  lockSystemdStopDeadlineSecs = 270;  # default; must stay below braid-online TimeoutStopSec

  # Tool version overrides -- the recommended setup omits nixpkgs `follows`, so
  # defaults come from braid's pinned nixos-26.05 (cache hit); `follows` is an
  # opt-out that tracks your nixpkgs but rebuilds braid (cache miss). See "Tool overrides".
  # packages.cryptsetup = pkgs.cryptsetup;
  # packages.btrfsProgs = pkgs.btrfs-progs;
  # packages.utilLinux = pkgs.util-linux;
  # packages.nut = pkgs.nut;
  # packages.smartmontools = pkgs.smartmontools;
  # packages.ethtool = pkgs.ethtool;

  autoScrub = {
    enable = true;       # default
    interval = "monthly"; # default; any systemd calendar expression
  };

  monitor = {
    enable = true;       # default
    interval = "5min";   # default
    beep = true;         # default
    alertCommand = null; # default; e.g. "curl -s -d 'alert' https://ntfy.sh/my-nas"
  };

  autoUnlock = {
    enable = false;  # default
    keyDevice = "/dev/disk/by-id/usb-Kingston_DataTraveler_XXXX-0:0";
    timeoutSec = 5;  # default
    allowDegraded = false; # default
  };

  autoSuspend = {
    enable = false;   # default
    wolInterface = "eno1";
    idleTime = 900;   # default (15 minutes)
  };

  fanControl = {
    enable = false;    # default; opt-in
    pwm = {
      platformDevice = "f71882fg.656";  # from pwmconfig (required)
      number = 2;                        # from pwmconfig (required)
      minStart = 65;     # from hddfancontrol pwm-test (required)
      maxStop = 60;      # from hddfancontrol pwm-test (required)
    };
    minTemp = 30;      # default
    maxTemp = 40;      # default
    minFanSpeedPercent = 20;  # default
    interval = "30s";  # default
  };

  ups = {
    enable = false;        # default; opt-in
    name = "ups";          # default
    driver = "usbhid-ups"; # default
    port = "auto";         # default
  };
};

← braid

Sharing and permissions

How braid manages mount point permissions and how to share the pool over the network. Read this when setting up file access for multiple users or configuring Samba/NFS.

Mount point permissions

After every mount-producing command (unlock, add, recover), braid sets the mount root to:

root:storage 2770

This means:

  • Owner (root): full access
  • Group (storage): read, write, execute
  • Others: no access
  • Setgid bit (2): new files and directories inherit the storage group

Any user in the storage group can read and write files under the mount point. The setgid bit ensures that files created by any group member are owned by the storage group, so all members can manage each other’s files.

Adding users to the storage group

# configuration.nix
users.users.alice = {
  isNormalUser = true;
  extraGroups = [ config.braid.poolAccessGroup ];
};

users.users.bob = {
  isNormalUser = true;
  extraGroups = [ config.braid.poolAccessGroup ];
};

Using config.braid.poolAccessGroup instead of the literal "storage" keeps the reference correct if you customize the group name.

For network-facing services like Jellyfin or Plex that should only read a single subtree, prefer mounting that subvolume separately and using POSIX ACLs over adding the service to the storage group. See Mounting subvolumes for the recipe.

Custom group name

braid.poolAccessGroup = "nas";

The group is created automatically. All behavior (mount permissions, setgid) works the same with any valid Unix group name.

Disabling the storage group

braid.poolAccessGroup = null;

When null, braid does not create a group or set permissions on the mount point after unlock. You manage permissions yourself.

Umask note

The setgid bit on the mount root ensures new files get the correct group. But the creating user’s umask controls the permission bits. The default umask (022) produces files with mode 644 (owner write, group/other read-only).

For collaborative write access where all group members can edit each other’s files, set a more permissive umask for processes that write to the pool:

# In a Samba share definition (see below), force create mode handles this.
# For SSH users, set umask in their shell profile:
programs.bash.interactiveShellInit = ''
  umask 002
'';

With umask 002, new files are 664 (owner and group read-write, other read-only) and new directories are 775.

Samba integration

Samba is not part of the braid module, but it works well with the braid mount point. Here is a declarative NixOS Samba config:

# configuration.nix
services.samba = {
  enable = true;
  openFirewall = true;

  settings = {
    global = {
      workgroup = "WORKGROUP";
      "server string" = "NAS";
      security = "user";
    };

    storage = {
      path = config.braid.mountPoint;
      browseable = "yes";
      "read only" = "no";
      "valid users" = "@${config.braid.poolAccessGroup}";

      # File permissions for Samba-created files
      "create mask" = "0664";
      "force create mode" = "0664";
      "directory mask" = "2775";
      "force directory mode" = "2775";
    };
  };
};

# Set Samba passwords (run once per user):
#   sudo smbpasswd -a alice

Key points:

  • valid users = @storage restricts the share to the storage group.
  • force create mode and force directory mode ensure group-writable permissions regardless of the client’s umask.
  • New files and directories inherit the storage group from the setgid bit braid sets on the mount root – a kernel behavior that does not require inherit permissions. force directory mode = 2775 keeps that setgid bit on Samba-created subdirectories so inheritance carries down the tree.
  • Samba users must also be system users in the storage group.

Multiple shares

Create separate shares for different directories under the mount point:

services.samba.settings = {
  photos = {
    path = "${config.braid.mountPoint}/photos";
    browseable = "yes";
    "read only" = "no";
    "valid users" = "@${config.braid.poolAccessGroup}";
    "create mask" = "0664";
    "force create mode" = "0664";
    "directory mask" = "2775";
    "force directory mode" = "2775";
  };

  media = {
    path = "${config.braid.mountPoint}/media";
    browseable = "yes";
    "read only" = "yes";  # read-only share
    "valid users" = "@${config.braid.poolAccessGroup}";
  };
};

Binding shares to the pool lifecycle

By default, samba-smbd.service (the systemd unit NixOS creates from services.samba.enable) keeps running after braid lock. If a client is mid-transfer when you lock, umount blocks until the file handle is released. Wire the share into the pool lifecycle so systemd starts samba-smbd after braid unlock and stops it again before braid lock runs umount:

systemd.services.samba-smbd = {
  # Start smbd when braid marks the pool online after a successful unlock.
  wantedBy = [ "braid-online.service" ];
  # Stop smbd when braid-online stops, before braid lock unmounts the pool.
  bindsTo = [ "braid-online.service" ];
  # Order smbd on the correct side of braid-online start and stop jobs.
  after = [ "braid-online.service" ];
  # Skip boot or direct starts when the braid mount point is not mounted.
  unitConfig.ConditionPathIsMountPoint = config.braid.mountPoint;
};

All four fields are load-bearing and do different jobs:

  • wantedBysamba-smbd starts when braid-online.service starts (i.e. after braid unlock).
  • bindsTosamba-smbd stops if braid-online.service stops or goes inactive (i.e. before braid lock runs umount).
  • after – ordering only, ensures samba-smbd is started/stopped on the correct side of braid-online.service.
  • ConditionPathIsMountPoint – skips activation when the braid mount point is only an offline directory, so any start the triad did not initiate cannot serve an unmounted pool.

braid lock walks systemctl show -P BoundBy braid-online.service (the reverse of BindsTo=) and stops every consumer this way before unmount, and ConditionPathIsMountPoint keeps them from restarting against an offline pool. This is the same pattern braid’s own scrub timer uses (see modules/braid/storage.nix).

The condition matters even with wantedBy: NixOS also starts Samba at boot through samba.target (which samba-smbd.service is wantedBy), and that boot edge would start smbd before any unlock. ConditionPathIsMountPoint is what stops it from serving the empty, offline mount directory. Only smbd serves files from the pool and can hold it busy during lock, so leave samba.target, nmbd, and winbindd untouched.

NFS

The same approach works for NFS. Export the braid mount point and control access at the network level:

services.nfs.server = {
  enable = true;
  exports = ''
    ${config.braid.mountPoint} 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
  '';
};

Adjust the subnet and options for your network. See exports(5) for the full option reference.

The same wantedBy + bindsTo + after + ConditionPathIsMountPoint pattern on braid-online.service (see “Binding shares to the pool lifecycle” under Samba above) applies to nfs-server.service if you want NFS to stop before braid lock runs umount and start again after braid unlock. As with Samba, the condition gates NixOS’s default nfs-server.service boot-start edge (wantedBy = [ "multi-user.target" ]) against an offline braid mount point.

Auto-suspend integration

If you enable braid.autoSuspend, active SMB and NFS connections automatically block suspend. This is auto-detected from whether services.samba or services.nfs.server is enabled in your NixOS config – no extra configuration needed.

← braid

Mounting subvolumes

Mount a btrfs subvolume at a custom path when a person or service should see one part of the pool as its own filesystem. Common examples are a friendlier path under /home, or a media path like /var/lib/jellyfin/media.

How braid mounts the pool

braid mounts the btrfs top-level subvolume (subvolid=5) at /mnt/storage by default. Treat that as the management mount: it is where you create subvolumes, run btrfs commands, and manage the whole pool. Consumer services do not need access to that mount root.

The subvol= mount idiom

btrfs can mount a subvolume directly with subvol=<path>. The btrfs docs describe the important isolation property this way: “the parent directory is not visible and accessible”, which is “similar to a bind mount”.

For braid, that means a service can see only movies at /var/lib/jellyfin/media without needing permission to traverse /mnt/storage.

Recipe: mount a subvolume at a custom path

Create the subvolume while the pool is unlocked:

sudo btrfs subvolume create /mnt/storage/movies

Find the btrfs filesystem UUID from braid status (look for the FSID: line; the JSON form is braid status --json and the field is fsid):

sudo braid status

Add a native systemd mount unit to your NixOS configuration:

systemd.mounts = [{
  what = "/dev/disk/by-uuid/<btrfs-fs-uuid>";
  where = "/home/dan/my-movies";
  type = "btrfs";
  options = "subvol=movies,ro,noatime";
  wantedBy = [ "braid-online.service" ];
  bindsTo = [ "braid-online.service" ];
  after = [ "braid-online.service" ];
}];

Field notes:

  • what points at the btrfs filesystem UUID, not an individual LUKS disk.
  • where is the path where the subvolume should appear.
  • type = "btrfs" selects the btrfs mount helper.
  • options selects the movies subvolume. ro is optional but recommended for read-only consumers.
  • wantedBy starts the mount when braid-online.service activates after braid unlock.
  • bindsTo is the load-bearing lifecycle edge. It puts the mount unit in BoundBy braid-online.service, which is what braid lock stops before unmounting the pool.
  • after orders mount startup after braid-online.service, so the btrfs /dev/disk/by-uuid symlink exists before systemd resolves what.

Rebuild and verify:

sudo nixos-rebuild switch
findmnt /home/dan/my-movies
systemctl show -P BoundBy braid-online.service

The escaped mount unit name, for example home-dan-my\x2dmovies.mount, should appear in BoundBy braid-online.service.

subvol= vs bind mount

Both approaches can expose a subtree at another path. subvol= is the better default for braid because it is conventional btrfs configuration, it mounts the subvolume directly, and it does not require the consumer to traverse /mnt/storage.

Use a bind mount only when the consumer already has permission to traverse the source mount and you need the same mounted data at multiple paths.

Why not fileSystems with x-systemd.requires?

fileSystems is fstab-shaped. systemd’s fstab options can express Requires= and After=, but not an arbitrary BindsTo=braid-online.service edge. Without BindsTo, the mount is not listed in BoundBy braid-online.service, so braid lock will not stop it before unmounting the pool. Use native systemd.mounts for lifecycle-bound subvolume mounts. See ADR 018 for the lifecycle model.

Worked example: read-only access for Jellyfin

Create the media subvolume:

sudo btrfs subvolume create /mnt/storage/movies

Mount it where Jellyfin expects media:

systemd.mounts = [{
  what = "/dev/disk/by-uuid/<btrfs-fs-uuid>";
  where = "/var/lib/jellyfin/media";
  type = "btrfs";
  options = "subvol=movies,ro,noatime";
  wantedBy = [ "braid-online.service" ];
  bindsTo = [ "braid-online.service" ];
  after = [ "braid-online.service" ];
}];

Grant Jellyfin read-only traversal on the subvolume contents:

sudo setfacl -R    -m u:jellyfin:rx /mnt/storage/movies
sudo setfacl -R -d -m u:jellyfin:rx /mnt/storage/movies

Do not add jellyfin to storage. That would grant the daemon read-write access across the whole pool. The ACL above scopes read access to one subvolume, and the subvol= mount means Jellyfin does not need to traverse /mnt/storage itself.

Bind Jellyfin to the mount unit:

services.jellyfin = {
  enable = true;
  openFirewall = true;
};

systemd.services.jellyfin = {
  wantedBy = lib.mkForce [ "var-lib-jellyfin-media.mount" ];
  bindsTo = [ "var-lib-jellyfin-media.mount" ];
  after = [ "var-lib-jellyfin-media.mount" ];
  unitConfig.ConditionPathIsMountPoint = "/var/lib/jellyfin/media";
};

Bind the service to var-lib-jellyfin-media.mount, not directly to braid-online.service. That ensures Jellyfin starts only after its media path is mounted. During braid lock, systemd stops Jellyfin first, then the subvolume mount, then braid unmounts the management mount and closes LUKS.

The full triad pattern is the same lifecycle shape described in Sharing and permissions.

Verify:

sudo -u jellyfin ls /var/lib/jellyfin/media
sudo braid lock
systemctl is-active jellyfin.service
systemctl is-active var-lib-jellyfin-media.mount

Point the Jellyfin web UI at /var/lib/jellyfin/media. After braid lock, both units should be inactive and the LUKS devices should be closed.

Offline mountpoint safety

braid seals the pool mountpoint immutable (chattr +i) while the pool is offline, so a process writing /mnt/storage before the pool mounts fails with EPERM instead of silently landing data on the root filesystem (which the pool would then hide on mount). See ADR 028.

This boot seal covers only the pool mountpoint (/mnt/storage). It has two consequences for subvolume mounts:

  • Subvolumes mounted under /mnt/storage are inherently protected by the parent seal – the bare mountpoint is the sealed directory. This is the safe default; prefer it.
  • Subvolumes mounted at separate paths (like the /var/lib/jellyfin/media example above) are not auto-sealed. While the pool is offline the systemd.mounts unit is stopped, leaving a bare directory at that path. The unit you wired with bindsTo = braid-online.service does not write while offline, but any other process that writes the path while the pool is offline lands data on root and gets shadowed on the next mount – the same bug the boot seal fixes for /mnt/storage.

To protect a separate-path subvolume mountpoint, seal it manually with the explicit-path form while the pool is offline:

sudo braid seal-mountpoint /var/lib/jellyfin/media

This is the braid-native remedy (the appliance has no chattr on its PATH). It reports a non-zero exit if it could not protect the path, so a failed seal is visible. It is not self-healing – unlike the pool mountpoint, braid does not re-seal these paths on every boot, and braid doctor does not probe them. Re-run it after a reconfiguration that recreates the directory. To clear it later, use braid seal-mountpoint --unseal <path>.

What’s next

← braid

Troubleshooting

Symptom-oriented index for common problems. Find your symptom below and follow the resolution.

Balance fails with “No space left on device”

btrfs balance needs temporary free space to relocate chunks. Braid balances convert both data and metadata profiles, so either side can hit ENOSPC even when there appears to be space available.

Fix: Free up empty data block groups first, then retry the original operation:

sudo btrfs balance start -dusage=0 /mnt/storage

The usage=0 pass relocates only completely empty data block groups, so it does not need temporary work space. Keep recovery balances data-only: metadata block groups are write headroom, and balancing them can hit metadata ENOSPC and force the filesystem read-only.

If the retry still fails, inspect data vs metadata usage:

sudo btrfs filesystem usage /mnt/storage

df’s “Used” and “Available” columns cannot distinguish data, metadata, and snapshot references, while braid status reports the same btrfs-derived capacity. In btrfs filesystem usage, compare the Data and Metadata used/size ratios to see which side is the bottleneck.

If there is enough temporary work space, a non-zero data threshold can reclaim nearly-empty groups, but it moves data:

sudo btrfs balance start -dusage=10 /mnt/storage

Pool won’t mount

Symptom: braid unlock fails because pool.json is missing or corrupted.

Fix: Rebuild UUID-keyed pool.json from disk labels and LUKS UUIDs. How you start depends on the state of pool.json – bare discover previews only when the file is absent; over a corrupt file it refuses and points you to discover --write.

If pool.json is missing – preview, then write:

sudo braid discover
# Shows discovered disks -- verify they look correct
sudo braid discover --write

If pool.json is corrupt or unreadable – skip the preview and rebuild in place (bare discover refuses corrupt state before scanning):

sudo braid discover --write

The corrupt rebuild preserves the original bytes at pool.json.corrupt-<RFC3339-UTC> before overwriting; do not remove it first.

Then unlock normally:

sudo braid unlock

discover scans /dev/disk/by-id/ for LUKS devices with braid-* labels and reconstructs the membership file. See Recovery scenarios for details.

If pool.json is healthy and UUID-keyed, discover --write refuses on purpose. Use braid add / braid remove / braid replace for normal membership changes. If you have deliberately decided to re-discover instead, move the file aside before running discover --write:

sudo mv /var/lib/braid/pool.json /var/lib/braid/pool.json.manual-backup
sudo braid discover --write

Interrupted operation (pending-op.json exists)

Symptom: braid commands fail with an error about a pending operation. This happens when a previous add, remove, remove-missing, or replace was interrupted (power loss, crash, killed process).

Fix: Use braid recover:

sudo braid recover

Recover reads the pending-operation journal, opens LUKS devices and mounts the pool if needed, probes the live btrfs topology, and rebuilds pool.json from actual state. It clears the journal only after the idle/no-paused recovery path succeeds.

Important

If recover refuses owed RAID1 replay because btrfs balance state is paused, running, or unknown, it left pending-op.json in place. Inspect btrfs manually before clearing recovery state.

If devices are missing (drive failure during the interrupted operation):

sudo braid recover --allow-degraded

For scripted/unattended recovery:

echo "my-passphrase" | sudo braid recover --passphrase-stdin

See Recovery scenarios for detailed walkthroughs.

Missing device after drive failure

Symptom: braid status shows a missing device. The pool may be mounted degraded or may fail to mount.

You have two options:

Option A: Replace the disk (rebuilds data onto a new disk)

# Find the old disk name from braid status
sudo braid replace --old toshiba2 \
  --new toshiba4=/dev/disk/by-id/ata-NEW_DRIVE_SERIAL

Replace copies data from surviving redundant copies onto the new disk. This restores full RAID1 redundancy. It takes hours for large disks.

Option B: Forget the missing device (no data rebuild)

# Find the missing device's btrfs devid from braid status
sudo braid remove-missing --missing-id 3

This removes the dead device entry from the btrfs filesystem. No data is rebuilt – you lose the redundant copy that was on the dead drive. The pool continues as a smaller array. Use this when you do not have a replacement disk available.

Auto-unlock fails

Symptom: Pool is not unlocked after reboot despite auto-unlock being configured.

Check the service logs:

journalctl -u braid-auto-unlock.service

Common causes:

  • USB device not found: The USB drive was not plugged in or the keyDevice path is wrong. Verify with ls /dev/disk/by-id/ | grep usb.
  • Keyfile not found: The USB filesystem does not contain braid.key at the root. The file must be named exactly braid.key.
  • Keyfile resolves outside mount: A symlink on the USB points outside /run/braid-key/. The service refuses this for security.
  • Timeout too short: The USB device takes longer to enumerate than timeoutSec. Increase it in your NixOS config.
  • Missing devices: If a pool disk is dead and allowDegraded = false (the default), auto-unlock exits with code 2. Set braid.autoUnlock.allowDegraded = true to allow degraded mount.

See Auto-unlock for the setup guide.

Beeper won’t stop

Symptom: The PC speaker is beeping (initially every few seconds, then less often) due to a disk health alert.

Fix: Acknowledge the alert:

sudo braid ack

This stops the beep loop and clears the alert state. Then investigate the underlying problem:

sudo braid status
sudo braid doctor

braid commands blocked by “another operation in progress”

Symptom: braid unlock, braid add, or braid recover fails with a message about another braid operation holding the pool lock.

The pool-mutating commands acquire an exclusive lock on /run/braid-pool.lock. If a previous command is still running (or crashed without releasing the lock), new commands fail fast.

Fix: Wait for the running command to finish. If the previous command crashed, the lock file is released automatically (it is a flock on a /run/ file, which is tmpfs and cleared on reboot). If you need to proceed before a reboot:

# Check if any braid process is still running
ps aux | grep braid
# If nothing is running, the lock was released — retry your command

Scrub won’t start

Symptom: systemctl status braid-scrub.timer shows the timer is inactive.

The scrub timer is lifecycle-bound to braid-online.service. It only runs while the pool is unlocked and mounted. If a scrub was cancelled by lock or shutdown, braid resumes the partial scrub the next time the pool comes online.

# Check pool state
sudo braid status
# If pool is offline, unlock it
sudo braid unlock
# Timer should now be active
systemctl status braid-scrub.timer

Scrub reported errors

Symptom: braid status shows Last scrub: <ts> (N errors) or braid monitor raised a btrfs error alert after a scrub.

The scrub error count braid reports is authoritative – braid parses it from btrfs scrub status. Journal lines are diagnostic clues, not a complete per-error ledger: the kernel emits scrub messages through rate-limited helpers, so a busy or bursty scrub can produce fewer journal lines than the count. A non-zero count with sparse or missing journal lines is not a braid bug – it usually means the kernel dropped log entries to stay under its rate limit.

Use the command printed under the scrub status, or run journalctl directly:

sudo journalctl -k --since '<scrub-start-time>' --grep 'BTRFS.*(at logical.*on (dev|mirror)|super block at physical)'

Output comes in two distinct grammars depending on whether the error is in a data/metadata extent or in a superblock copy.

Extent errors (data and metadata). Each affected sector may log a repair-summary line:

  • Corrected via RAID1 mirror: fixed up error at logical N on dev /dev/mapper/braid-X physical N (or ... on mirror N when the source mirror has no device). btrfs RAID1 read the healthy mirror and wrote it back over the bad copy. No file path – corrected lines give block coordinates only. A count consisting mostly of fixed up error lines means data integrity was preserved; investigate the disk that produced the bad reads.
  • Uncorrectable: unable to fixup (regular) error at logical N on dev X physical N (or ... on mirror N). RAID1 could not recover – the mirror was also bad or no mirror exists. The block is permanently damaged.

An uncorrectable extent error may also log an additional detail line that identifies what was lost. The detail emission is gated by a second rate-limit check, so it is not guaranteed to appear for every uncorrectable error. When present, the shapes are:

  • Data extent, path resolved. ... at logical N on dev X, physical N, root N, inode N, offset N, length N, links N (path: subdir/victim.bin). (path: ...) is relative to the affected btrfs subvolume root, not absolute. The kernel builds it from paths_from_inode() (reference/linux/fs/btrfs/scrub.c:457, reference/linux/fs/btrfs/backref.c:2125) and does not know what mount point exposes that subvolume. Prepend the mount point of the affected subvolume (default subvolume at /mnt/storage; named subvolumes wherever you configured them).
  • Data extent, path resolution failed. Same shape but ends ... path resolving failed with ret=N instead of (path: ...). Usually means the extent has no remaining inode references (file already deleted) or the inode lives in a snapshot rooted under a different subvolume than the search root.
  • Metadata. ... at logical N on dev X, physical N: metadata leaf|node (level N) in tree N. Tree-block corruption – no file path because the bad block lives in a btrfs tree, not in user data. Persistent metadata errors indicate disk failure.

Superblock errors. Logged as standalone messages from scrub_supers, not as repair-summary + detail pairs. The grammar is independent of the extent path:

  • super block at physical N devid N has bad csum
  • super block at physical N devid N has bad generation N expect N

Damage to one of the device’s superblock copies. Investigate the device (identified by devid), not a file.

For the path-resolution-failed case, you can try inode-resolve as a best-effort:

sudo btrfs inspect-internal inode-resolve <inode> /mnt/storage

This succeeds only if the inode still exists in the subvolume rooted at the supplied path. Deleted files, extents with no remaining references, or files that live in a different subvolume will still produce no result – the kernel logged “path resolving failed” for the same reason.

A non-zero error count after a scrub means at least one block failed its checksum or I/O. With btrfs RAID1, blocks with a healthy mirror are repaired automatically (counted as Corrected – the fixed up lines above); Uncorrectable means both copies were bad and the file (for data) or tree block (for metadata) is now damaged. The journal output is your best diagnostic surface, but treat it as evidence rather than a complete ledger: rely on the scrub count for “how many,” and on the journal for “what kind, and where the kernel could log it.” Restore affected files from backup and run braid ack once you have investigated.

SMB/NFS service inactive after braid lock

Symptom: systemctl status samba-smbd.service (or nfs-server.service) shows inactive (dead) immediately after you ran braid lock.

This is intentional. On NixOS module installs, braid lock stops every service bound to braid-online.service via BindsTo=braid-online.service before it unmounts the pool. The cascade prevents busy-mount unmount failures.

Fix: Run braid unlock. It reactivates braid-online.service after mount, and systemd restarts every consumer that is also WantedBy=braid-online.service.

If the service does not restart on braid unlock, it is wired for the stop side (BindsTo) but not the start side (WantedBy). The recommended setup wires the share into the full pool lifecycle – see Binding shares to the pool lifecycle.

← braid

Recovery scenarios

Detailed walkthroughs for recovering from failures. Read this when braid status or another command tells you something is wrong, or when you are planning for failure ahead of time.

Overview: discover vs recover

braid has two recovery commands that solve different problems:

CommandWhen to useWhat it does
braid discover --writepool.json is missing or corruptedScans disk labels to rebuild pool.json
braid recoverpending-op.json exists (interrupted mutation)Opens pool, probes live topology, rebuilds pool.json, and clears the journal after the idle/no-paused recovery path succeeds; preserves the journal when owed RAID1 replay finds a paused, running, or unknown balance state

discover solves metadata loss – the CLI’s record of which disks belong to the pool is gone, but the disks themselves are fine. It reads LUKS labels (braid-<name>) and LUKS UUIDs from /dev/disk/by-id/ devices to reconstruct UUID-keyed membership.

recover solves interrupted operations – an add, remove, remove-missing, or replace was killed mid-flight (power loss, crash, OOM). The pending-operation journal (/var/lib/braid/pending-op.json) records what was in progress. Recover opens the pool, inspects what actually happened on disk, and rebuilds pool.json to match reality.

Lost pool.json

Symptom: braid unlock fails because /var/lib/braid/pool.json does not exist.

Cause: Accidental deletion, filesystem corruption, or migrating to a new NixOS install.

Steps

  1. Verify no pending operation exists:
ls /var/lib/braid/pending-op.json
# If this file exists, use `braid recover` instead (see below)
  1. Scan for braid disks:
sudo braid discover

Output looks like:

  toshiba1 = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_XXXX
  toshiba2 = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_YYYY
  toshiba3 = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_ZZZZ
  1. Verify the output matches your expected pool members. Then write:
sudo braid discover --write

This creates /var/lib/braid/pool.json.

If you can name the expected member count ahead of time, record it from your own records or prior braid status output and pass it as a fail-closed guard:

EXPECTED=3
sudo braid discover --write --expect-count="$EXPECTED"
  1. Unlock normally:
sudo braid unlock

Notes

  • For a healthy UUID-keyed pool.json, discover --write refuses – use braid add / braid remove / braid replace to mutate membership instead.
  • For a corrupt or off-schema existing pool.json, discover --write rebuilds in place; no manual remove step is needed. The original bytes are preserved at pool.json.corrupt-<RFC3339-UTC> adjacent to the new file in case manual forensic recovery is needed (e.g. extracting a devid for a null_underlying member). The snapshot is a hard precondition: if it cannot be written (full disk, read-only state directory), discover --write refuses rather than destroy the corrupt original; free disk space or fix permissions and retry.
  • discover --write refuses to run if pending-op.json exists. Use braid recover instead. (Bare discover is read-only and runs regardless.)
  • discover only finds LUKS2 devices. LUKS1 devices with braid labels are skipped with a warning.
  • The rebuilt pool.json is keyed by LUKS UUID. Disk names are stored in each member value for command input and display.
  • When multiple /dev/disk/by-id/ symlinks point to the same device, discover picks the most stable one (wwn > nvme > scsi > ata > usb).

Interrupted add/remove/replace

Symptom: braid commands fail with a message about a pending operation. ls /var/lib/braid/pending-op.json confirms the journal file exists.

Cause: A pool mutation (add, remove, remove-missing, replace) was interrupted before it could complete. The journal records the operation type, the pre-operation membership, and the target membership. Existing-pool add journals also record a phase: PoolMutation for unfinished disk preparation or btrfs membership, and PostAddBalanceRaid1 after membership is committed but balance work remains.

Steps

  1. Preview what recover will do:
sudo braid recover --dry-run

This shows the recovery plan without making changes: which LUKS devices will be opened, whether the pool needs mounting, and the final pool.json state.

  1. Run recovery:
sudo braid recover

Recover will:

  • Open the LUKS devices needed for the journal phase
  • Mount the btrfs pool
  • Probe the live btrfs topology to determine what actually happened
  • For existing-pool add PoolMutation, first open and scan any already-committed journaled add targets that can be reconciled without wiping or adding
  • For add PoolMutation, finish only the journaled add targets that are not already live
  • For add PostAddBalanceRaid1, skip all disk preparation and btrfs add steps, then run the owed RAID1 balance only when btrfs balance state is idle; preserve pending-op.json when a paused, running, or unknown balance state requires manual inspection
  • Rebuild or repair pool.json only when live membership is complete
  • Clear pending-op.json only after required membership and balance work is complete
  1. Verify:
sudo braid status

Interrupted between returned-disk wipe and add

If an existing braid-labeled disk was being returned to the pool and the add was interrupted after wipefs --types btrfs but before btrfs device add, run:

sudo braid recover

Recover replays the add from the journaled returned-disk target. Do not wipe the disk and retry it as a fresh add; the journal still records the checked LUKS identity and expected pool FSID.

Interrupted fresh-disk add

For an interrupted fresh-disk add, recover replays the format, optional keyfile enrollment, LUKS header backup, mapper open, and btrfs device add from the journaled options when the disk is present.

If the disk is absent or has a different LUKS label than the journal records, recover fails and leaves pending-op.json in place. Reconnect the original disk or replace the target, then rerun sudo braid recover.

Pending-op file corruption

Symptom: braid reports that /var/lib/braid/pending-op.json could not be parsed.

The remediation phrase is:

Remove /var/lib/braid/pending-op.json after manual reconciliation (see docs/internals/luks-unlock.md) and re-run.

It is safe to remove pending-op.json only when one of these is true:

SituationSafe?
No disk-level mutation committed: no LUKS format, no btrfs device add, no cryptsetup open of a fresh-format targetYes
braid status confirms the live pool already reflects the intended state and the journal is staleYes
A mutation is partially complete, such as mkfs.btrfs run but btrfs device add did not, or replace is paused mid-rebuildNo

When it is not safe, keep the journal in place and investigate the interrupted operation before editing state.

Out-of-band reformat during recovery

Recover refuses if a target disk’s live LUKS UUID no longer matches the journal. This catches a disk that was reformatted, swapped, or cloned after the original operation started.

Messages to search for:

  • add recovery aborted: target ... LUKS UUID mismatch
  • recover replace target '...' LUKS UUID mismatch: expected ..., found ...

Do not force the journal forward. Investigate the foreign reformat or swapped disk, restore the intended disk if possible, and rerun recovery.

See also Unlock refused by a foreign or mismatched disk for the same identity check on the braid unlock path.

Never-enriched member with null-underlying mapper

A member can be known to btrfs by devid while its LUKS backing device is gone (cryptsetup status reports device: (null)). If the member was never enriched with a persisted devid, recovery cannot bind that null-underlying mapper back to a UUID-keyed membership entry.

Let braid recover complete when it can preserve the member. The next read-side command observes the live devid and braid remove-missing becomes available again if the device is truly gone.

Duplicate or missing devid in journal snapshot

Recovery may refuse with internal errors equivalent to duplicate journaled devids or no member for a journaled devid. This means the journal snapshot cannot safely resolve a btrfs devid to a UUID-keyed member.

Do not edit pool.json; that resolution did not consult it. Re-run recovery only after manual reconciliation of pending-op.json.

Committed-but-closed add target

If the journaled add target is already a live pool member but its mapper is closed when recover starts, recover opens and scans it during the reconciliation pass. After the live-pool re-probe, the target is included in pool.json and is not re-added.

This can still prompt for the pool passphrase even when the pool is already mounted, because the target mapper may need to be opened before recover can discover that it already committed.

With missing devices

If a drive failed during the interrupted operation:

sudo braid recover --allow-degraded

Without --allow-degraded, recover exits with code 2 when devices are missing. The degraded flag allows mounting with missing devices so recovery can complete. Redundancy is reduced until the missing device is replaced.

Scripted recovery

For unattended recovery (e.g. from a remote script):

echo "my-passphrase" | sudo braid recover --passphrase-stdin

Or with a passphrase file:

sudo braid recover --passphrase-file /path/to/passphrase

Recover for a replace journal when the pool is already mounted

Symptom: sudo braid recover exits with recover refuses to probe an already-mounted pool when the journal records a replace ... and instructs you to run braid lock first.

Cause: The pool was mounted by something other than braid recover itself (typically a manual cryptsetup open + mount after a crash, since braid unlock and braid-auto-unlock.service both refuse to mount when a pending-op journal exists). For a replace journal, the kernel may have resumed an interrupted dev_replace on that mount session, leaving stale in-memory device state that recover cannot distinguish from real topology. The cycle that scrubs this state needs to unmount and remount, which is unsafe on a mount recover does not own.

Steps

sudo braid lock      # works with a journal present -- no pending-op preflight
sudo braid recover   # opens its own mount and runs the relock cycle

braid lock unmounts the pool and closes the LUKS mappers. braid recover then opens a fresh mount session, finishes any in-progress kernel dev_replace, and runs the umount-and-remount cycle that clears stale btrfs_fs_devices – the standard happy path for replace recovery.

Unlock refused by a foreign or mismatched disk

Symptom: braid unlock exits with LUKS UUID mismatch. A disk at a recorded by-id slot reports a LUKS UUID that differs from the one in pool.json; the error names the disk, its by-id path, and the expected vs found UUID.

Cause: The disk was swapped, cloned, or reformatted out of band, so its LUKS identity no longer matches the recorded member. This is a hard refusal during probing, before any mapper opens. --allow-degraded does not bypass it – that flag only covers missing disks, and this disk is present.

If the swap was unintended

Detach the foreign disk and reattach the original. braid unlock then succeeds.

If the swap was intentional

braid replace requires the pool mounted, but the present mismatched disk blocks the mount. Make the slot read as missing first, then replace:

  1. Detach the foreign disk so the member reads as absent.
  2. Mount the pool degraded:
    sudo braid unlock --allow-degraded
    
  3. Replace the now-missing member following Missing disk -> Option A: Replace the disk. braid replace prepares its own --new disk; see braid replace for how it handles a disk that already carries a LUKS header.

See also Out-of-band reformat during recovery for the same identity check on the braid recover path (a different trigger).

Missing disk (drive failure)

Symptom: braid status shows a device as missing. The pool may be mounted degraded or may refuse to mount.

Unlock with a missing disk

If the pool is not mounted:

sudo braid unlock --allow-degraded

This mounts the pool in degraded mode. All data is still accessible (btrfs RAID1 keeps a copy on the surviving disk(s)), but the pool is running with reduced redundancy until you replace the dead drive.

Hot-unplug while pool is mounted

If a drive is physically disconnected while the pool is mounted, its LUKS mapper can remain open with cryptsetup status reporting device: (null). btrfs continues to list the devid but has not yet promoted it to MISSING. braid status reports the devid – it contributes to missing_count and appears in missing_devids – but braid remove-missing --missing-id N and braid replace (with or without --missing-id) refuse the devid because they only act on btrfs-authoritative MISSING entries.

To make progress:

  1. Confirm the disk is truly gone (not just a loose cable).
  2. Relock and re-unlock the pool degraded so btrfs re-evaluates membership and promotes the devid:
    sudo braid lock
    sudo braid unlock --allow-degraded
    
  3. Re-run braid status – the devid should now appear as authoritatively MISSING – then retry braid remove-missing or braid replace.

Option A: Replace the disk

Replaces the dead disk with a new one, rebuilding data from surviving copies:

sudo braid replace --old toshiba2 \
  --new toshiba4=/dev/disk/by-id/ata-NEW_DRIVE_SERIAL

--old identifies the missing member. If you want to cross-check the btrfs devid from braid status, add --missing-id after the required args:

sudo braid replace --old toshiba2 \
  --new toshiba4=/dev/disk/by-id/ata-NEW_DRIVE_SERIAL \
  --missing-id 3

Replace runs btrfs replace start -B under the hood. braid replace is a long-running online operation: the command waits in the foreground and shows progress while the pool remains usable. It can take hours for large drives, so run it from a shell you can leave open (or a tmux/screen session). From another shell, braid status and braid tui can show progress independently.

Option B: Remove the missing device

Forgets the dead device without rebuilding data:

# Find the missing device's btrfs devid from braid status
sudo braid remove-missing --missing-id 3

Use this when you do not have a replacement disk. The pool continues with fewer disks and reduced capacity. Data that was only on the dead drive is lost (but in RAID1, all data has a second copy on another drive).

When this clears the last missing device and 2+ disks remain, remove-missing blocks on a follow-up soft RAID1 balance to restore redundancy on chunks written as single during degraded operation. You will see [wait] pool: restoring RAID1 redundancy... then [ok] pool: RAID1 redundancy restored before the command returns. The wait scales with how much data was written while degraded: an idle pool finishes in seconds, while a pool written to heavily during degraded mode can take longer. A sleep inhibitor is held for the entire operation. See braid remove-missing for the full sequence.

Verify:

sudo braid status

A successful result shows no missing devices and no single profile rows for data or metadata.

Choosing between replace and remove-missing

replaceremove-missing
Requires new diskYesNo
Rebuilds dataYesNo
Restores redundancyYesPartial: restores RAID1 profiles when 2+ disks remain, but does not add replacement capacity
DurationHours (large disks)Minutes
When to useYou have a replacementNo replacement available

Degraded mount

A degraded mount means at least one pool disk is missing. The pool is usable but the pool is running with reduced redundancy on the missing device’s share of data.

When degraded mounts happen

  • braid unlock --allow-degraded – explicit request
  • braid recover --allow-degraded – recovery with missing devices
  • braid.autoUnlock.allowDegraded = true – auto-unlock config

Risks

  • Reduced redundancy – the pool is short the missing device’s mirror copy of existing data, and on 2-disk pools new writes are allocated as single-profile chunks. A further drive failure could lose data.
  • No self-healing – btrfs cannot repair corrupted blocks from a redundant copy if the copy was on the missing device.

Resolution

Replace the missing disk as soon as possible:

sudo braid replace --old <missing-name> \
  --new <new-name>=/dev/disk/by-id/<new-drive>

After replace completes, the pool is fully redundant again.

Recovery decision tree

braid command fails
├── "pending operation" error
│   └── braid recover [--allow-degraded]
├── pool.json missing
│   └── braid discover --write → braid unlock
├── "LUKS UUID mismatch" error
│   └── see "Unlock refused by a foreign or mismatched disk"
├── missing device / won't mount
│   ├── braid unlock --allow-degraded
│   └── then: braid replace or braid remove-missing
└── something else
    └── braid doctor → check troubleshooting guide

State files reference

All state lives under /var/lib/braid/:

FilePurpose
pool.jsonUUID-keyed pool membership; each value stores disk name, by-id path, prior devid, and added-at timestamp
pending-op.jsonUUID-keyed pending operation journal (present only during mutations)
acked-stats.jsonAcknowledged btrfs device stats baseline
smartd-alertFlag file set by smartd alert script
alert-latch.jsonActive alert state
luks-headers/LUKS header backups

← braid

braid add

Add one or more disks to the braid pool. Creates a new pool if none exists, or expands an existing one.

When to use it

  • Setting up a new NAS (bootstrap with one or more disks)
  • Expanding storage by adding a new drive to an existing pool

Basic example

sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234

Common variations

Bootstrap a new pool with two disks (creates RAID1 immediately):

sudo braid add \
  toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 \
  toshiba2=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_5678

Add a disk to an existing pool:

sudo braid add toshiba3=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_9012

Preview what would happen without making changes:

sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --dry-run

Skip the confirmation prompt (for scripting):

sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --yes

Pass passphrase non-interactively:

echo -n 'hunter2' | sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --passphrase-stdin
sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --passphrase-file /tmp/pass.txt

Enroll a keyfile for auto-unlock from a mounted USB drive at the same time:

sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --enroll /mnt/usb

Mount the USB first so the --enroll directory refers to removable media, not persistent host storage.

Important flags

FlagPurpose
--dry-runShow what would happen without executing
--yesSkip interactive confirmation
--passphrase-stdinRead passphrase from stdin instead of TTY prompt
--passphrase-file <path>Read passphrase from a file (conflicts with --passphrase-stdin)
--enroll <dir>Enroll braid.key from this directory into LUKS slot 1 on each adopted disk – fresh or returning – whose slot 1 is empty; idempotent skip if the keyfile already authenticates slot 1
--luks-format-arg=<ARG>Advanced: pass one raw argument to cryptsetup luksFormat, repeated once per argument; always use the equals form (e.g. --luks-format-arg=--pbkdf). braid refuses flags it manages itself – identity, key-material, integrity, and on-disk-layout options such as --uuid, --label, --type, --key-file, and offset/sizing flags.
--progress auto|always|neverControl progress display (default: auto)

Disk spec format

Each disk is specified as NAME=PATH, where:

  • NAME is a short label you choose (e.g. toshiba1)
  • PATH is the /dev/disk/by-id/ stable device path

The name is stored in pool.json and used in LUKS mapper names (braid-toshiba1), LUKS labels, and all future commands. The persistent member identity is the LUKS UUID, not the name.

What happens under the hood

  1. Probes each disk to determine its state (fresh, braid-labeled, or foreign)

  2. Shows a confirmation prompt with the disk’s name and by-id path, plus its model/size/serial (best-effort from the live device via lsblk – omitted if unavailable)

  3. For fresh disks: pre-generates a LUKS UUID, LUKS-formats the disk with the pool passphrase and braid-<name> label, enrolls the --enroll keyfile into slot 1 if provided, creates a LUKS header backup, and opens the LUKS mapper

    See Pending LUKS header backups – copy each .luksheader off-system and delete the local copy.

  4. If no pool exists: creates a btrfs filesystem (RAID1 if 2+ disks, single if 1 disk; braid explicitly pins the block-group-tree feature bit so that bit is visible and stable across toolchain versions – see ADR-027)

  5. If a pool exists: writes a phased UUID-keyed journal, adds the device to the existing btrfs filesystem, records the new membership in pool.json, then advances the journal to the balance phase

  6. If the pool now has 2+ disks: balances data to RAID1, then clears the journal – unless the pool has a missing device, in which case the balance is skipped (a [skip] note explains why). Redundancy is restored later by remove-missing or replace, not by the degraded add.

Keyfile enrollment (--enroll DIR): braid enrolls braid.key into LUKS slot 1 on every adopted disk – fresh or returning. On a fresh disk slot 1 is always empty, so the keyfile is always added. On a returning braid disk braid first probes the keyfile: if it already authenticates slot 1 the enrollment is an idempotent skip with no slot change and no new header backup; if slot 1 is empty the keyfile is added. (If slot 1 holds a different, unknown key braid refuses – see Safety checks.) The keyfile is added before the header backup so the backup captures slot 1.

A sleep inhibitor is held during all irreversible operations to prevent the system from suspending mid-operation.

If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).

Disk acceptance rules

braid classifies each disk before acting:

  • Fresh disk (no LUKS): accepted. LUKS-formatted with the pool passphrase.
  • Returning braid disk (braid-labeled LUKS, btrfs FSID matches the pool): accepted as a recovery add. The disk is re-joined to the pool without reformatting. If the old btrfs signature would make btrfs device add refuse the disk, braid first runs the narrow wipe wipefs --all --types btrfs on the verified mapper, then uses btrfs device add -f.
  • Non-braid LUKS: refused. braid will not adopt a LUKS device it did not create.
  • Braid-labeled, wrong pool: refused. The disk belongs to a different btrfs filesystem.
  • Braid-labeled, no btrfs superblock: refused. The disk’s identity is ambiguous (could be partial init, clean eviction, or manual wipe). Wipe the disk and add as fresh.
  • Braid-labeled, pool not mounted: refused during bootstrap. Identity cannot be verified without a mounted pool.

Safety checks / refusal cases

  • Rejects duplicate disk names in the same command
  • Rejects disks that conflict with existing pool membership (same LUKS UUID, same name, or same by-id path)
  • Rejects absent disks (not plugged in)
  • Verifies the passphrase against an existing pool member before formatting new disks
  • Warns if the pool has missing devices but does not refuse: braid add still adds the new disk, but skips the RAID1 convert balance (it surfaces a [skip] note), so the pool stays degraded and redundancy is not restored at add time. To repair, either run braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...> to swap in a new disk for the missing member, or – on a 2-disk degraded pool where remove-missing alone would refuse (it cannot drop RAID1 below two devices) – run braid add then braid remove-missing to drop the dead member and rebalance onto the new disk.
  • Warns if existing pool drives have a keyfile but --enroll was not passed
  • With --enroll, refuses if an adopted disk’s LUKS slot 1 is occupied by an unknown key the keyfile does not authenticate – remove it first with cryptsetup luksKillSlot, then retry.
  • Refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
  • Refuses when UPS support is enabled and braid ups status cannot verify a trusted OL (utility-power) state.

Interrupted adds

Existing-pool adds recover in two phases:

  • PoolMutation: disk preparation and btrfs membership are not fully committed yet. braid recover may finish formatting a fresh target, re-open a verified returned target, run the narrow btrfs-signature wipe for that returned target, and run btrfs device add.
  • PostAddBalanceRaid1: membership and pool.json are committed. braid recover will not format, wipe, or add disks in this phase; it only mounts/probes the committed pool, repairs pool.json from the committed live topology if needed, runs the owed RAID1 balance when btrfs balance state is idle, and clears pending-op.json after that succeeds. A paused, running, or unknown balance state fails closed with the journal preserved.

← braid

braid remove

Remove a live disk from the pool. Data migrates off the disk before it is detached.

When to use it

  • Shrinking the pool (retiring a drive you no longer need)
  • Removing a drive that is still online and healthy

If the disk is already dead or missing, use braid replace to rebuild data onto a new disk, or braid remove-missing to forget the entry without rebuilding.

Basic example

sudo braid remove toshiba3

Common variations

Preview what would happen:

sudo braid remove toshiba3 --dry-run

Skip the confirmation prompt:

sudo braid remove toshiba3 --yes

Important flags

FlagPurpose
--dry-runShow what would happen without executing
--yesSkip interactive confirmation
--progress auto|always|neverControl progress display (default: auto)

What happens under the hood

  1. Probes the pool to verify the disk is a live member
  2. Checks that remaining disks have enough free space to absorb the data being migrated
  3. Shows a confirmation prompt with the disk’s name and devid, its model/size/serial (best-effort from the live backing device via lsblk – omitted if unavailable), and the resulting disk count (e.g. Pool: 3 disks -> 2 disks)
  4. If removing the second-to-last disk (going from 2 to 1): first balances the pool from RAID1 to single profile, then removes the device
  5. Runs btrfs device remove to migrate all data off the disk (this is the long-running step)
  6. Closes the LUKS mapper on the removed disk
  7. Updates pool.json to remove the member’s UUID entry

A sleep inhibitor is held during data migration and cleanup.

If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).

Safety checks / refusal cases

  • Refuses if the pool is not mounted
  • Refuses if the named disk is not a live member of the pool (suggests braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...> or braid remove-missing if missing devices are detected)
  • Refuses to remove the last disk from the pool
  • Refuses if there are missing devices in the pool (resolve those first)
  • Refuses if remaining disks lack space to absorb the removed disk’s data (ENOSPC pre-flight)
  • Warns when removal leaves a single disk (no RAID1 redundancy)
  • Refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
  • Refuses when UPS support is enabled and braid ups status cannot verify a trusted OL (utility-power) state.

← braid

braid remove-missing

Forget a stale missing-device entry from the pool. This does NOT rebuild data – use braid replace for that.

When to use it

  • A disk has permanently failed and you want to clean up the pool metadata without replacing it
  • You have already recovered your data and just need btrfs to stop reporting the missing device

This is a destructive choice: any data that only existed on the missing disk is lost. If you want to rebuild data onto a new disk, use braid replace instead.

Basic example

Note: braid remove-missing operates only on btrfs-authoritative MISSING devids. A drive that is hot-unplugged while the pool is mounted contributes to missing_count and appears in missing_devids in braid status before btrfs promotes its devid to MISSING; remove-missing refuses the devid with a specific hot-unplug diagnostic until that promotion happens. See Hot-unplug while pool is mounted.

First, find the missing device’s ID:

sudo braid status

Then remove it:

sudo braid remove-missing --missing-id 3

Common variations

Preview what would happen:

sudo braid remove-missing --missing-id 3 --dry-run

Skip the confirmation prompt:

sudo braid remove-missing --missing-id 3 --yes

Important flags

FlagPurpose
--missing-id <devid>Target missing device by btrfs devid (required)
--dry-runShow what would happen without executing
--yesSkip interactive confirmation
--progress auto|always|neverControl progress display (default: auto)

What happens under the hood

  1. Probes the pool to verify missing devices exist
  2. Validates that the specified devid is actually a missing device (not a live one)
  3. Resolves the btrfs devid to the UUID-keyed pool member whose persisted prior devid matches
  4. Shows a confirmation prompt with the disk name, devid, and the resulting disk counts (e.g. Pool: 2 present + 1 missing -> 2 disks)
  5. Writes a PoolMutation journal and runs btrfs device remove <devid> to clear the missing device entry
  6. Updates pool.json to remove the member’s UUID entry, then advances the journal to post-remove-missing maintenance
  7. If this was the last missing device and 2+ disks remain: runs a soft RAID1 balance (-dconvert=raid1,soft -mconvert=raid1,soft) to restore redundancy on any single-profile chunks created during degraded operation
  8. Clears the journal

A sleep inhibitor is held during the removal and the subsequent soft balance (if triggered).

If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).

Safety checks / refusal cases

  • Refuses if the pool is not mounted
  • Refuses if no missing devices are detected
  • Refuses if the specified devid belongs to a live device (use braid remove for that)
  • Refuses if the specified devid is not a device in this pool
  • Refuses if surviving disks lack space to absorb the missing device’s allocations (ENOSPC pre-flight), or if that pre-flight cannot run (the btrfs device usage probe failed to spawn, returned a nonzero exit, produced unparseable output, did not list the targeted missing devid, or reported an untrusted missing-device allocation shape: the targeted devid is listed more than once, carries an allocation profile braid does not model, or reports no positive Data/Metadata/System RAID1 row), when more than 1 surviving device exists
  • Refuses on a 2-disk RAID1 pool with one disk missing – the kernel refuses to drop a RAID1 pool below two devices. Use braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...> to repair the dead disk, or braid add first and then re-run.
  • Refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
  • Refuses when UPS support is enabled and braid ups status cannot verify a trusted OL (utility-power) state.

← braid

braid replace

Replace a disk with a new one using btrfs replace. Works for both live (still-online) and dead/missing disks.

When to use it

  • A disk has failed and you need to rebuild data onto a replacement
  • Proactively swapping a healthy disk for a larger or newer one

Basic example

The same invocation replaces a disk whether it is still live or already dead/missing. braid resolves --old against pool.json to find the member and detects its state automatically, so there is no mode to choose and --missing-id is never required:

sudo braid replace --old toshiba1 --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1

Common variations

Note: braid replace operates only on btrfs-authoritative MISSING devids. A drive that is hot-unplugged while the pool is mounted contributes to missing_count and appears in missing_devids in braid status before btrfs promotes its devid to MISSING; both an explicit --missing-id cross-check and the no-flag auto-resolve path refuse the devid with a specific hot-unplug diagnostic until that promotion happens. See Hot-unplug while pool is mounted.

Optionally assert which missing devid you expect (braid refuses if it disagrees with pool.json):

sudo braid replace \
  --old toshiba1 \
  --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 \
  --missing-id 3

Preview what would happen:

sudo braid replace --old toshiba1 --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 --dry-run

Enroll a keyfile from a mounted USB drive on the new disk:

sudo braid replace \
  --old toshiba1 \
  --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 \
  --enroll /mnt/usb

Mount the USB first so the --enroll directory refers to removable media, not persistent host storage.

Pass passphrase non-interactively:

sudo braid replace --old toshiba1 --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 --passphrase-file /tmp/pass.txt

Important flags

FlagPurpose
--old <name>Name of the disk to replace
--new <name>=<path>Name and by-id path of the replacement disk
--missing-id <devid>Optional cross-check for a dead-disk replace: assert the missing btrfs devid. braid refuses if it disagrees with the devid pool.json records for –old. Never required.
--enroll <dir>Enroll braid.key from this directory into LUKS slot 1 on the new disk
--dry-runShow what would happen without executing
--yesSkip interactive confirmation
--passphrase-stdinRead passphrase from stdin
--passphrase-file <path>Read passphrase from a file (conflicts with --passphrase-stdin)
--luks-format-arg=<ARG>Advanced: pass one raw argument to cryptsetup luksFormat, repeated once per argument; always use the equals form (e.g. --luks-format-arg=--pbkdf). braid refuses flags it manages itself – identity, key-material, integrity, and on-disk-layout options such as --uuid, --label, --type, --key-file, and offset/sizing flags.
--progress auto|always|neverControl progress display (default: auto)

What happens under the hood

For a fresh replacement disk (no LUKS):

  1. Pre-generates the replacement member’s LUKS UUID and LUKS-formats the new disk with the pool passphrase and a braid-<name> label
  2. Optionally enrolls a keyfile in slot 1
  3. Creates a LUKS header backup
  4. Opens the LUKS mapper

Then, for all replacements:

  1. Runs btrfs replace start to copy data from the old device (or its mirrors) to the new device
  2. Writes committed UUID-keyed membership to pool.json and advances the journal to post-replace maintenance
  3. For live replacements: closes the old disk’s LUKS mapper
  4. Resizes the new device to use its full capacity (important when the new disk is larger)
  5. For missing-disk replacements that clear the last missing device: runs a soft RAID1 balance to restore redundancy on any single-profile chunks
  6. Clears the journal

The fresh-disk path always produces a local LUKS header backup in step 3; the existing-LUKS path produces one only when --enroll actually adds slot 1, so an already-enrolled disk is a no-op with no new backup. See Pending LUKS header backups – copy each .luksheader off-system and delete the local copy.

A sleep inhibitor is held throughout the replace to prevent the system from suspending. Suspending mid-replace can corrupt the btrfs topology.

If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).

Safety checks / refusal cases

  • Refuses if the pool is not mounted
  • Refuses if --old and --new are the same disk
  • Refuses if the new disk’s LUKS UUID is already in use by the pool (registered membership or live btrfs devices) – detach the conflicting disk before retrying
  • Refuses if the new disk is absent (not plugged in)
  • Refuses if the new disk’s mapper capacity is smaller than the source disk’s btrfs total_bytes (read via BTRFS_IOC_DEV_INFO, the same value btrfs replace start compares against). For existing LUKS targets, mapper capacity is derived from the LUKS2 segment offset and size (dynamic means raw - offset, fixed means the segment size). For fresh-LUKS targets, braid uses cryptsetup’s default 16 MiB offset; offset-affecting --luks-format-arg flags (--offset/-o, --align-payload, --luks2-metadata-size, --luks2-keyslots-size, --sector-size) are rejected for this reason.
  • For live replacements: refuses if the pool has missing devices (resolve those first)
  • For missing replacements: refuses if --missing-id points to a live device
  • For missing replacements: refuses if --missing-id disagrees with the devid pool.json records for --old (--old already identifies which member to rebuild)
  • For missing replacements: refuses if pool.json has no recorded devid for --old--missing-id cannot substitute, it must match the recorded devid
  • Verifies the passphrase against an existing pool member before formatting
  • Warns before confirmation and in --dry-run if the live source device has I/O errors (informational, does not block)
  • Warns if existing pool drives have a keyfile but --enroll was not passed
  • Refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
  • Refuses when UPS support is enabled and braid ups status cannot verify a trusted OL (utility-power) state.

← braid

braid unlock

Open LUKS devices and mount the btrfs pool.

When to use it

  • After a boot, to unlock and mount the pool

Unless you’ve configured braid to automatically unlock on boot (braid.autoUnlock), you must use braid unlock to mount and access the pool.

Basic example

sudo braid unlock

You will be prompted for the pool passphrase.

Common variations

Pass passphrase non-interactively (useful for scripts or remote unlock):

echo -n 'hunter2' | sudo braid unlock --passphrase-stdin
sudo braid unlock --passphrase-file /tmp/pass.txt

Unlock with a binary keyfile from a mounted USB drive (e.g. for auto-unlock via systemd):

sudo braid unlock --key-file /mnt/usb/braid.key

Mount the USB first so the keyfile path refers to removable media, not persistent host storage.

Mount in degraded mode when a disk is missing:

sudo braid unlock --allow-degraded

Preview what would happen:

sudo braid unlock --dry-run

Important flags

FlagPurpose
--passphrase-stdinRead passphrase from stdin instead of TTY prompt
--passphrase-file <path>Read passphrase from a file (conflicts with --passphrase-stdin)
--key-file <path>Unlock with a binary keyfile instead of passphrase (conflicts with passphrase flags)
--allow-degradedAllow mounting with missing devices (degraded mode)
--dry-runShow what would happen without executing

What happens under the hood

  1. Checks that no other braid operation is pending
  2. Probes each UUID-keyed member in pool.json: checks whether the by-id device is present, whether its LUKS UUID matches, and whether its LUKS mapper is already open
  3. Verifies the selected credential against every disk it will unlock before opening any mapper
  4. Opens LUKS mappers for all locked disks using the verified credential
  5. Runs btrfs device scan to let the kernel discover all pool members
  6. Mounts the btrfs filesystem with noatime, skip_balance, and subvolid=5
  7. If any disks are unavailable and --allow-degraded is set: mounts with the degraded option
  8. After mount: enriches pool.json with live btrfs device IDs and related metadata – best-effort
  9. Checks for a paused balance and prints a warning if one is found

If all mappers are already open and the pool is already mounted, unlock is a no-op.

On NixOS module installs

After a successful mount, braid unlock activates braid-online.service. Any unit you have wired into the pool lifecycle with WantedBy=braid-online.service (e.g. an SMB or NFS unit – see Sharing and permissions) starts as part of that activation. braid lock stops them again on the way down via the matching BindsTo=braid-online.service.

Standalone CLI installs (no NixOS module) skip this – there is no braid-online.service to activate.

Degraded mode

When a disk is missing (physically absent or with an unreadable LUKS header), unlock refuses to mount by default. The error message names the affected disk and tells you to pass --allow-degraded.

In degraded mode, the pool mounts with reduced redundancy. New writes are NOT mirrored to the missing disk. You should repair the pool as soon as possible with braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>.

The exit code is 2 when a degraded mount is refused (vs. 1 for other errors), so scripts can distinguish the two cases.

Safety checks / refusal cases

  • Refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses to mount degraded without explicit --allow-degraded
  • Refuses if a present disk’s LUKS UUID does not match the UUID recorded in pool.json – the disk was swapped, cloned, or reformatted out of band. The error names the disk, its by-id path, and the expected vs found UUID. This is a hard error caught during the initial probe, before any mapper opens; --allow-degraded does not bypass it (that flag only covers missing disks). If unintended, detach the foreign disk and reattach the original; if the swap was intentional, see Unlock refused by a foreign or mismatched disk.
  • If any disk rejects the selected credential during verification, unlock fails before opening any mapper and names the failing disk. If another disk already accepted the same credential, that points to disk-specific credential drift outside braid.
  • Does not prompt for a passphrase if all mappers are already open (idempotent re-run)

← braid

braid lock

Unmount the btrfs pool and close all LUKS mappers.

When to use it

  • Before shutting down or rebooting (though systemd handles this automatically)
  • When you want to manually take the pool offline
  • Before physically removing a disk (after braid remove)

Basic example

sudo braid lock

Common variations

Preview what would happen:

sudo braid lock --dry-run

Important flags

FlagPurpose
--dry-runShow what would happen without executing

What happens under the hood

  1. Checks if the pool is mounted
  2. Checks that no btrfs exclusive operation (balance, device remove, etc.) is running. Skipped when the pool is not mounted.
  3. Unmounts the btrfs filesystem, retrying up to 3 times if the device is busy (covers the brief race after stopping SMB/NFS consumers, where the kernel has not yet released the last file descriptors)
  4. After a successful unmount, runs btrfs device scan --forget for the planned close-set mappers (member-owned plus any orphaned braid-* mappers from a prior crash) that still exist on disk, clearing the kernel’s device registry so stale references do not race with mapper close. Skipped when there is nothing left to forget.
  5. Classifies live mappers by LUKS UUID/devid ownership, then closes member-owned observed mapper names, retrying up to 3 times if the device is busy
  6. Scans for orphaned braid-* mappers not owned by UUID-keyed membership (e.g. from a prior crash) and closes those too

If the pool is already unmounted and all mappers are already closed, lock reports “pool already locked” and exits cleanly.

On NixOS module installs

When braid is installed via the NixOS module, braid lock also:

  • Stops braid-scrub.timer, braid-scrub-resume-trigger.service, and braid-scrub.service before unmount.
  • Stops any consumer wired into the pool lifecycle via BindsTo=braid-online.service (lock walks its reverse, BoundBy; e.g. an SMB or NFS unit you set up that way – see Sharing and permissions) before unmount.
  • Stops braid-online.service itself after a successful unmount.

braid unlock reverses the third step: it reactivates braid-online.service after mount, which restarts every consumer that is also WantedBy=braid-online.service. A consumer wired with only one of the two half-works – BindsTo stops it before lock, WantedBy restarts it after unlock – so wire both; the sharing guide shows the full setup.

Standalone CLI installs (no NixOS module) skip all three – there is no braid-online.service or scrub unit to stop.

Error handling

  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • If unmount fails after 3 retry attempts (e.g. a process has files open on the mount), lock skips btrfs device scan --forget and still attempts to close the LUKS mappers, reporting the failure
  • If a mapper close fails with “device busy” after unmount also failed, the error is downgraded to a warning (the root cause is likely the stuck unmount)
  • The hint lsof <mount_point> or fuser -vm <mount_point> is printed when unmount fails, to help identify the blocking process
  • If a scanned braid-* mapper’s backing LUKS UUID cannot be verified (for example because its backing device is gone or its LUKS header is unreadable), lock prints a [warn], leaves that mapper open instead of closing it, excludes it from both btrfs device scan --forget and the close step, and still exits cleanly. Re-run braid lock once the mapper’s LUKS UUID is readable. The literal cleanup incomplete summary line appears only under --dry-run; a real run surfaces the per-mapper [warn] and does not print pool already locked. See ADR-024.

← braid

braid seal-mountpoint

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Set the immutable attribute (chattr +i) on the pool mountpoint while it is unmounted, so a process that writes the path before the pool mounts fails loudly with EPERM instead of silently landing data on the root filesystem (which the pool then hides when it mounts over it).

You rarely run this by hand: the NixOS module runs the bare form automatically from the braid-seal-mountpoint boot/activation unit. The explicit-path forms are maintenance levers. See ADR 028.

When to use it

  • Re-seal now after the doctor reports the mountpoint is mutable while offline (instead of waiting for the next boot or nixos-rebuild switch).
  • Seal a separate-path subvolume mountpoint the boot seal does not cover (e.g. /var/lib/jellyfin/media – see Mounting subvolumes).
  • Clear an orphaned old mountpoint after changing braid.mountPoint.

Basic example

Seal the configured mount point (what the boot unit runs):

sudo braid seal-mountpoint

Common variations

Seal a specific directory (e.g. an offline subvolume mountpoint):

sudo braid seal-mountpoint /var/lib/jellyfin/media

Clear the immutable attribute on a path (e.g. an orphaned old mount point):

sudo braid seal-mountpoint --unseal /mnt/old-storage

Important flags

FlagPurpose
PATHSeal a specific directory instead of the configured mount point
--unsealClear the immutable attribute instead of setting it (requires PATH)

What happens under the hood

  1. Opens the path as a directory (a non-directory is refused).
  2. Confirms the path is not currently a mountpoint (via statx’s STATX_ATTR_MOUNT_ROOT, on the same file descriptor, so a racing mount cannot cause it to seal a live filesystem root). A live mountpoint is skipped.
  3. Sets (or, with --unseal, clears) FS_IMMUTABLE_FL on the bare directory’s inode. The attribute is persistent – it survives unmount and reboot.

The pool mounts normally over a sealed directory; the mounted filesystem’s own root governs writes.

Exit codes

  • Bare form (braid seal-mountpoint) is best-effort and always exits 0 – boot must not fail on it. Problems are logged as warnings to the journal.
  • Explicit forms (braid seal-mountpoint <path> / --unseal <path>) report an honest desired-state exit code: exit 0 only if the path actually ends up in the requested state (immutable for seal, mutable for unseal), non-zero otherwise – so a manual seal that silently failed to protect a path is visible.

Error handling

  • --unseal refuses the currently configured mount point – it must stay sealed while the pool is offline. Change braid.mountPoint first, then unseal the old path.
  • --unseal acquires the pool lock and fails fast if another braid operation is in progress, so it cannot interleave with an unlock that would remount over the path mid-operation.
  • A live mountpoint is never touched (the explicit forms report this as a failure).
  • If the root filesystem does not support the immutable attribute, the bare form logs one clear warning and protection is unavailable (rare – only non-NAS roots like vfat/9p/nfs).
  • braid doctor – warns when the offline mountpoint is mutable
  • braid lock – take the pool offline (the seal persists across the cycle)
  • braid unlock – mount the pool over the sealed directory

← braid

braid idle

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Check if the pool has any active operations. Designed for autosuspend integration.

When to use it

  • As an autosuspend check to prevent the system from sleeping during a scrub or any btrfs exclusive operation (balance, device add, device remove, device replace, resize, swap activate)
  • In scripts that need to wait for the pool to be idle before proceeding

Basic example

sudo braid idle

Output when the pool is mounted and idle:

idle: pool is idle

Output when the pool is not mounted (still exit 0 – nothing to protect):

idle: pool is offline

Exit codes

Exit codeMeaning
0Pool is idle, or pool is offline
1Pool is busy (running op) or pool state could not be determined
2Setup error – config could not be read

The busy reason is printed to stdout:

busy: scrub running (45%)
busy: balance running
busy: balance paused
busy: device add in progress
busy: device remove in progress
busy: device replace in progress
busy: resize in progress
busy: swap activate in progress
busy: unknown (<probe>: <error>)

Only the scrub line carries a percentage. The named btrfs operation states come from scanning /sys/fs/btrfs/*/exclusive_operation, which reports the active operation but not its progress.

busy: unknown (<probe>: <error>) is printed when a probe failed. The probe label is mountinfo for /proc/self/mountinfo, sysfs for /sys/fs/btrfs/*/exclusive_operation, or scrub for btrfs scrub status command/parser failures. The error text preserves the underlying diagnostic.

When the pool is offline (not mounted), exit code is 0 – there is nothing to protect, so suspend is safe.

braid idle must run as root. A non-root invocation exits 1 with error: braid must be run as root on stderr before config loading or any probe runs, with no stdout output. The streams disambiguate this from the documented exits above: exit 0 prints idle: on stdout, busy/probe-failure exit 1 prints busy: on stdout, and config-load exit 2 emits a config-error diagnostic.

Autosuspend integration

braid idle is the activity check behind braid’s auto-suspend. You don’t write this check by hand: set braid.autoSuspend.enable = true and braid’s NixOS module generates the autosuspend services.autosuspend ExternalCommand check (BraidPool) for you. The generated command – bash -c '! timeout -k 2 10 braid idle', with fully qualified /nix/store paths for bash, timeout, and braid – handles the exit-code inversion autosuspend expects and a fail-closed inner timeout. Don’t reproduce it by hand: autosuspend runs the check outside braid’s wrapper, so bare braid/timeout are not on its PATH.

See the power management guide for setup, and ADR 016: Auto-Suspend for the exit-inversion table, the qualified-path requirement, and why timeout must sit inside the !-inverted command.

What happens under the hood

  1. Checks if the pool is mounted (via /proc/self/mountinfo)
  2. If not mounted: returns idle (exit 0)
  3. Reads /sys/fs/btrfs/*/exclusive_operation for any active exclusive operation on any btrfs filesystem: balance, balance paused, device add, device remove, device replace, resize, swap activate
  4. If sysfs reports a busy operation or the sysfs probe fails, returns immediately before probing scrub
  5. Probes scrub status via btrfs scrub status against the configured pool mount point, only after the sysfs scan is clean (scrub is not in the kernel exclusive-operation set, so sysfs cannot detect it)

When the host has more than one btrfs filesystem (e.g. a btrfs root in addition to the pool), an exclusive op on any of them keeps the system awake while the pool is mounted, and the busy: line above may name an op on the non-pool fs. This is intentionally conservative – see ADR 016: Auto-Suspend. Scrub detection is narrower: braid idle only checks for a scrub on the braid pool itself, so a scrub running on a non-pool btrfs (e.g. the btrfs root) is not detected and does not block suspend.

← braid

braid status

Show pool health, per-disk detail, capacity, and operation progress.

When to use it

  • After unlocking, to verify everything is healthy
  • To check on a running scrub, balance, or replace
  • To find device IDs needed by other commands (--missing-id)
  • To investigate alerts or degraded state

Basic example

sudo braid status

Common variations

Machine-readable JSON output:

sudo braid status --json

Important flags

FlagPurpose
--jsonOutput the full status report as JSON

Output sections

Pool summary

Pool:     /mnt/storage
Status:   intact
FSID:     <uuid>
Profile:
  Data:      RAID1
  Metadata:  RAID1
  System:    RAID1

Status values:

StatusMeaning
intactAll disks present, no issues
DEGRADED (N missing devices)One or more disks are missing; redundancy is reduced on the missing device’s data
not mountedPool is offline (LUKS closed or not mounted)

Profile section:

Profile: summarizes btrfs profiles per block-group type. btrfs profiles are per type, so Data, Metadata, and System can differ; see btrfs balance profiles for the background.

Per-type renderingMeaning
RAID1 (also RAID1C3, RAID1C4, RAID10)Mirrored across drives; reads self-heal from the redundant copy.
DUP (same-disk copies; no disk redundancy)Two copies on the same physical device, the default metadata/system profile on a 1-device pool. Survives bit-rot, not device failure.
single (no redundancy) (also RAID0 (no redundancy))One copy across the affected block groups. Checksums detect bit-rot, but corruption cannot be repaired.
single, RAID1 (not fully redundant)Block groups for this type span more than one profile, typically after an interrupted balance or degraded writes. Run braid doctor for the right next step; doctor recommends a soft RAID1 balance on a healthy pool and braid replace first on a degraded pool.
unknownNo block groups of this type were reported. Check braid status advisories for a df probe failure.
RAID5, RAID6, or any unrecognized namebraid does not classify parity profiles or future btrfs profiles. The raw profile name is shown verbatim with no annotation so the operator can make their own call; braid only ever produces single, DUP, and RAID1.

The whole Profile: section is omitted when the pool is not mounted or when btrfs filesystem df failed.

Alert banner

When a health alert is active, a banner appears at the top of the output:

ALERT -- disk health issue detected. Run 'braid ack' to acknowledge and silence.
  - btrfs device errors on toshiba1 (devid 1)
  - SMART health warning

Alert causes include btrfs device errors, missing devices, and SMART health warnings. Alerts are latched – they persist until acknowledged with braid ack, even if the underlying condition resolves.

Allocation table

Shows how data is distributed across block group types:

Allocation:
  Type       Profile  Used        Allocated
  Data       RAID1    1.20 TiB    1.50 TiB
  Metadata   RAID1    512.00 MiB  1.00 GiB
  System     RAID1    64.00 KiB   32.00 MiB

Capacity

Capacity:
  Total:  10.91 TiB (Estimated)
  Used:   1.20 TiB
  Free:   9.50 TiB

For RAID1, the total is estimated as the effective mirrored capacity (not raw disk sum). With mismatched disk sizes, the oversized portion of the largest drive cannot be fully mirrored. The estimate accounts for this.

Total is omitted when the pool is degraded (the estimate would be misleading with missing devices).

Drives (compact listing)

Drives:
  toshiba1     sda  devid=1  present
  toshiba2     sdb  devid=2  present
  toshiba3     -    devid=3  missing

Each row shows the disk name, its short kernel device (e.g. sda), its btrfs devid, and its state. A disk not assembled into the live pool – missing, offline, or LUKS-mismatched – shows - for its device.

The devid column shows devid=N only when the live pool currently counts that device missing: a btrfs-MISSING device, or a hot-unplugged member whose backing device is gone (null-underlying). It falls back to - when no live devid exists – a persisted devid the live pool no longer counts missing, or a member with no recorded devid.

That devid is the input to the braid remove-missing --missing-id and braid replace workflows. As with the JSON missing_devids field, a transient hot-unplug devid shown here is refused by both braid remove-missing and braid replace until btrfs promotes the device to MISSING; see Hot-unplug while pool is mounted.

State values use the same vocabulary as Per-disk detail below, rendered lowercase and hyphenated in this compact list (e.g. missing, offline, luks-uuid-mismatch).

Balance progress

Shown only when a balance is running or paused:

Balance:  running, 3/10 chunks (30% complete)
Balance:  paused, 5/12 chunks (58% complete)

Last scrub result

Last scrub: Mon Jan  1 00:00:00 2024 (no errors)
Last scrub: Mon Jan  1 00:00:00 2024 (3 errors)
Last scrub: Mon Jan  1 00:00:00 2024 cancelled (will resume)
Last scrub: Mon Jan  1 00:00:00 2024 interrupted
Last scrub: never
Last scrub: running (45%)

A nonzero error count replaces (no errors) with (N errors) on a finished scrub, and prefixes the cancelled (will resume) and interrupted lines when a partial scrub recorded errors. When the count is nonzero, braid appends a copyable kernel-journal query for the per-error detail lines:

Last scrub: Mon Jan  1 00:00:00 2024 (3 errors)
  scrub error details:
  sudo journalctl -k --since '2024-01-01 00:00:00' --grep 'BTRFS.*(at logical.*on (dev|mirror)|super block at physical)'

The --since argument is the scrub’s start time. See Scrub reported errors for how to read the journal output – including corrected vs. uncorrectable lines and why the count can exceed the visible journal lines.

Per-disk detail

What each disk shows depends on whether it is a live pool member. A live pool member shows its device path, model, serial, LUKS UUID, btrfs I/O error counters (the btrfs: line), and a SMART verdict (the SMART: line). These last two are different layers: btrfs: is the filesystem’s own I/O accounting, SMART: is the drive’s self-report. Any other disk – missing, offline, UUID mismatch, header-unreadable, or unknown – shows a reduced set: its device path and btrfs: unknown (<reason>) / SMART: unknown (<reason>) lines in place of counters; a UUID-mismatch disk also shows its observed LUKS: UUID so the divergence is visible. Separately, any disk that needs attention – for example a missing disk, or a present member with nonzero error counters – gets an Action: line naming the next command (detailed below).

Disks:

  toshiba1          devid 1   present
    Device:  /dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234
    Model:   TOSHIBA MN07ACA12T
    Serial:  1234ABC
    LUKS:    aaaaaaaa-1111-2222-3333-444444444444
    btrfs:   read 0 / write 0 / flush 0 / corruption 0 / generation 0
    SMART:   ok

  toshiba2          devid 2   present
    Device:  /dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_5678
    Model:   TOSHIBA MN07ACA12T
    Serial:  5678DEF
    LUKS:    bbbbbbbb-1111-2222-3333-444444444444
    btrfs:   read 12 / write 0 / flush 0 / corruption 3 / generation 0
    SMART:   warning (2 reallocated)
    Action:  braid replace --old toshiba2 --new <new-name>=/dev/disk/by-id/<...>

  toshiba3          MISSING
    Device:  /dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_9ABC  (not found)
    btrfs:   unknown (device absent)
    SMART:   unknown (device absent)
    Action:  braid replace --old toshiba3 --new <new-name>=/dev/disk/by-id/<...>

Disk states (compact Drives: list and detail view):

StateMeaning
presentDisk is online and healthy
MISSINGDisk not found at its by-id path
OFFLINEDisk is present and LUKS identity matches membership, but it is not assembled into the live pool
LUKS HEADER UNREADABLEDevice present but LUKS header cannot be read
LUKS UUID MISMATCHDevice present but its LUKS header UUID differs from the recorded member – swapped, cloned, or reformatted; run braid doctor
UNKNOWNState could not be determined

btrfs: line. A live, present pool member shows real btrfs counters (read / write / flush / corruption / generation). Every other disk shows btrfs: unknown (<reason>), where <reason> names why counters are unavailable: device absent, LUKS header unreadable, LUKS UUID mismatch, disk offline -- not in pool, or metadata unavailable. (This line was labeled Errors: before braid reported SMART; it was renamed to btrfs: so it reads as a sibling of the SMART: line, not the only error concept.)

SMART: line. A live, present pool member shows the drive’s SMART verdict: ok, warning, failing, or unknown. When the drive reports an out-of-spec attribute, the verdict carries a parenthetical listing the concern(s) – e.g. warning (2 reallocated) or warning (92 percentage used). The parenthetical follows the evidence, not the verdict word, so a failing drive whose attributes braid reads as non-nominal also lists them (failing (5 reallocated)); a bare failing/ok/unknown has no evidence to show. Every non-present disk shows SMART: unknown (<reason>) with the same reasons as the btrfs: line. The SMART verdict is independent of the btrfs counters: a drive can report clean btrfs I/O while SMART reads warning, and vice versa.

Action: line. When a disk needs attention, braid status appends an Action: line naming the next command, so you do not have to look it up:

ConditionAction: line
Missing member, or a present member with nonzero error countsbraid replace --old <name> --new <new-name>=/dev/disk/by-id/<...>
Missing or errored device with no pool membership (foreign mapper)foreign mapper detected -- run 'braid doctor' to investigate
LUKS UUID mismatchdisk was swapped, cloned, or reformatted; detach the foreign disk and reattach the original, or run 'braid replace' if the swap was intentional -- run 'braid doctor' for the expected vs observed UUID
LUKS header unreadablerun 'braid doctor' for recovery guidance

Healthy present disks and disks in the OFFLINE or UNKNOWN state get no Action: line. These hints are human-output only; --json consumers derive their own remediation from the status and errors fields (the JSON disks[] element has no action field).

See braid replace to rebuild a missing or failing disk and braid doctor for the guided recovery path.

Advisories

braid status may print one or more warning: lines above the pool summary. Each warning corresponds to an entry in the JSON advisories array.

Foreign filesystem at the mount point. When something other than the braid pool is mounted at the configured mount point (for example, a stale tmpfs or ext4 mount left by another tool), braid status reports Status: not mounted and names the actual filesystem type:

warning: /mnt/storage is mounted but fstype is ext4, not btrfs

Unmount the foreign filesystem before retrying braid unlock – otherwise unlock reports “pool already mounted” because something is in fact mounted at that path.

Pending recovery journal. When /var/lib/braid/pending-op.json exists, an interrupted add / remove / remove-missing / replace is owed. braid status prints the advisory whether or not the pool is mounted:

warning: interrupted operation detected (pending-op.json exists, started 2026-05-20T10:30:00Z) -- run 'braid recover' to reconcile

Run sudo braid recover to reconcile from live pool state; do not remove pending-op.json by hand except under the conditions documented in Pending-op file corruption. If the journal is unreadable, the advisory carries the canonical manual-reconciliation phrase instead – because braid recover cannot load an unparseable journal either:

warning: failed to parse pending-op.json: <detail>. Remove /var/lib/braid/pending-op.json after manual reconciliation (see docs/internals/luks-unlock.md) and re-run.

See Unparseable state-file reconciliation for the safe-to-remove conditions.

Pending LUKS header backups

When a header-mutating operation (braid add, braid replace, braid enroll) writes a local LUKS header backup to /var/lib/braid/luks-headers/<disk>.luksheader, braid status prints a warning until those files are removed:

warning: LUKS header backups exist in /var/lib/braid/luks-headers -- copy offsite and delete local copies

The local copy is a transient byproduct of the header-mutating operation, not the intended backup target. Copy each .luksheader file to an off-system location (USB, another machine, cloud key storage), then remove the local copy to silence the warning.

See LUKS header backup workflow for the full rationale.

ENOSPC risk on RAID1 pool. When an intact mounted pool is one disk-loss away from insufficient RAID1 chunk-pair space, braid status prints:

warning: ENOSPC risk: 1 of 3 devices have less than 1.00 GiB unallocated -- if a disk fails, the pool may be unable to allocate RAID1 chunks to restore redundancy. Add capacity with 'braid add', delete unneeded files or snapshots, or compact data chunks with 'btrfs balance start -dusage=50 <mount>' (data only; do not balance metadata).

For 2-disk pools, the warning fires when either device drops below the threshold because new RAID1 chunks need space on both devices. For 3+ device pools, braid simulates each possible single-disk loss and warns when any survivor set would have too little pairable unallocated space. The per-device threshold is min(1 GiB, 10% of total device bytes), matching btrfs’s effective data chunk size.

See Balance fails with No space left on device for recovery options.

Config-disk probe fault. While building per-disk detail, braid status probes every configured member’s LUKS header and its expected braid-<name> mapper. When that probe fails – a braid-<name> mapper hijacked by a foreign container, a backing-path mismatch, a LUKS1 header, or an unreadable/unspawnable cryptsetup call – the fault is recorded as an advisory naming the affected disk:

warning: disk 'disk2' mapper '/dev/mapper/braid-disk2' is open but not backed by the configured disk. Expected LUKS UUID ..., found ...

Unlike the mutating commands (add, replace, enroll), which fail closed on such a fault, status is the always-available read-only diagnostic: it stays non-fatal (exit 0) and still prints the full pool summary, capacity, and per-disk detail. The fault degrades a single member, not the whole report.

A member already live and healthy in the pool keeps its present row – its identity comes from the LUKS-UUID membership join, which tolerates mapper drift (decision 024) – and the advisory is its only flag. Only an affected member that is not live in the pool additionally gets an unknown disk row, so it is neither silently dropped from the detail section nor mislabeled missing in the compact Drives: list.

JSON output

--json produces a structured report suitable for monitoring tools. Key fields:

  • mount_point: the pool’s configured mount path (e.g. /mnt/storage) – the same value shown on the human-readable Pool: line. Always present, in both the mounted and not-mounted envelopes.
  • status: "intact", "degraded", or "not_mounted"
  • total_devices: total number of devices btrfs reports for the pool, as a number. Present when the pool is mounted; omitted in the not-mounted envelope.
  • present_count: number of member devices currently present, equal to total_devices - missing_count, as a number. Present when the pool is mounted; omitted in the not-mounted envelope.
  • missing_count: number of member devices counted as missing – the cardinality of the missing_devids array below (btrfs-MISSING devices plus null-underlying mappers whose backing device disappeared); 0 on a healthy pool. Present when the pool is mounted; omitted in the not-mounted envelope.
  • fsid: the btrfs filesystem UUID, as a string – the same value shown on the human-readable FSID: line, and distinct from a disk’s luks_uuid. Present when the pool is mounted (a mounted btrfs filesystem always has an FSID); omitted in the not-mounted envelope.
  • disks: array of per-disk reports – one element per disk braid knows about: present pool members (matched members and foreign live devices), plus configured disks that are not currently live pool members (reported as missing, offline, unknown, luks-header-unreadable, or luks-uuid-mismatch; see the status values below). The field list below describes a live pool member element (as in the example); diagnostic unpooled elements differ as called out per field and in the note after the example.
    • luks_uuid: the disk’s LUKS UUID – the persistent member identity. For a matched live pool member it equals the pool.json membership key; a foreign live pool device carries an observed UUID that is not in membership (paralleling its mapper-basename name). A luks-uuid-mismatch diagnostic row carries the observed on-disk UUID so the divergence is visible. Other non-live rows (missing, offline, unknown, luks-header-unreadable) report ""; correlate them by name, not luks_uuid.
    • name: operator-facing name (e.g. toshiba1). For a matched present member it is resolved via the UUID-keyed membership join; for a foreign present device it falls back to the mapper basename; for a non-present disk it is the configured name. For display/command selection, not identity.
    • by_id: stable /dev/disk/by-id/... hardware path – a runtime handle, not identity.
    • mapper: device-mapper name – a runtime handle, not identity. For a present pool member it is the observed live mapper; for a matched member that is normally braid-<name> but may have drifted (decision 024 tolerates mapper drift), so do not reconstruct it as braid-${name} or you will miss the drift. For a non-present disk (missing, offline, unknown, luks-header-unreadable, luks-uuid-mismatch) braid does not report an observed mapper, so it emits the expected braid-<name> derived from the configured name, paralleling the configured name and by_id on those rows.
    • underlying: current backing block device (e.g. /dev/sda), or null when the disk is not a live pool member.
    • devid: btrfs device ID as a number (e.g. 1), or null when the disk is not a live pool member.
    • status: one of present, missing, luks-header-unreadable, luks-uuid-mismatch, offline, unknown.
    • btrfs_errors: btrfs I/O error counters (read, write, flush, corruption, generation, all integers) – the filesystem’s I/O accounting. Present when btrfs device stats are available; omitted entirely otherwise – including for present disks when btrfs device stats fails (which also emits a btrfs device stats failed advisory). (This field was named errors before braid reported SMART; it was renamed so it reads as a sibling of smart, not the only error concept.)
    • smart: the drive’s own SMART self-report – a verdict plus supporting evidence, a different layer from btrfs_errors. Present for live pool members; omitted for disks with no backing path to probe. The object always carries health ("ok", "warning", "failing", or "unknown"). When SMART evidence is available it also carries a protocol discriminator ("sata" or "nvme") and the per-protocol counters – for SATA reallocated_sectors, pending_sectors, offline_uncorrectable; for NVMe media_errors, critical_warning, percentage_used, available_spare, available_spare_threshold – plus celsius when the drive reports a current temperature. A drive whose detail log is absent (or whose health is unknown) carries health alone. This field is diagnostic evidence only – it does not feed the alert latch (see the note under alert_causes).
{
  "name": "toshiba1",
  "mapper": "braid-toshiba1",
  "by_id": "/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234",
  "luks_uuid": "aaaaaaaa-1111-2222-3333-444444444444",
  "devid": 1,
  "underlying": "/dev/sda",
  "status": "present",
  "btrfs_errors": { "read": 0, "write": 0, "flush": 0, "corruption": 0, "generation": 0 },
  "smart": { "health": "ok", "protocol": "sata", "reallocated_sectors": 0, "pending_sectors": 0, "offline_uncorrectable": 0, "celsius": 26 }
}

A diagnostic unpooled disk (missing, offline, unknown, or luks-header-unreadable) reports "luks_uuid": "", "devid": null, "underlying": null, and no btrfs_errors or smart key. offline is present but not assembled; the others reach the same blank/null row shape because no live member row is available. Correlate these rows by name.

A config-disk probe fault (see the Advisories section above) on a member that is not live in the pool produces an unknown row of this shape – the advisory carries the cause. A member that is live keeps its present row and is flagged by the advisory alone, so not every probe fault adds an unknown row.

  • alert_active: boolean
  • alert_causes: array of alert cause objects. Omitted entirely when no alert is active (the key is absent, not []) – check the always-present alert_active boolean first, mirroring how advisories is “omitted when none”. When present, each object is tagged by a type discriminator:
    • { "type": "btrfs_device_errors", "devid": <number> } – btrfs I/O errors on that device.
    • { "type": "missing_device", "devid": <number> } – a device counted as missing.
    • { "type": "smartd_alert" } – a SMART health warning from smartd.
    • { "type": "computation_error", "detail": "<string>" } – braid could not compute alert state; detail explains.

The per-disk smart field does not feed the alert latch. The smartd_alert cause is driven by the smartd daemon’s flag (/var/lib/braid/smartd-alert; see ADR 014 and ADR 030), not by the live per-disk SMART probe status runs. So a report can carry a degraded smart object ("health": "warning") while alert_active is false and no smartd_alert cause is present. This is intentional: the per-disk smart field is diagnostic evidence; smartd remains the alert source.

  • advisories: array of human-readable advisory strings (omitted when none). See the Advisories section above for what currently produces them.
  • missing_devids: array of every devid counted in missing_count (btrfs-MISSING devices and null-underlying mappers whose backing device has disappeared). For destructive remove-missing / replace --missing-id workflows, see those commands’ notes – a null-underlying devid here will be rejected by those commands until btrfs promotes it to MISSING.
  • profile: object with data, metadata, and system arrays, present whenever btrfs reports block-group allocation and omitted when the pool is not mounted or btrfs filesystem df failed. Each array contains raw btrfs profile names such as single, DUP, RAID0, RAID1, RAID1C3, RAID1C4, RAID5, RAID6, RAID10, or an unrecognized name verbatim. Arrays use canonical domain order, not alphabetical order, so mixed data is ["single", "RAID1"], not ["RAID1", "single"]. An empty array means btrfs reported no block groups of that type.
  • capacity: total_bytes, used_bytes, free_bytes
  • allocation: array of block-group entries, one per allocated type. Each entry has bg_type (e.g. Data, Metadata, System), profile (raw btrfs profile name, same vocabulary as profile above), used_bytes, and allocated_bytes (both integers). Omitted when the pool is not mounted or btrfs filesystem df failed.

3-disk RAID1 profile:

"profile": {
  "data": ["RAID1"],
  "metadata": ["RAID1"],
  "system": ["RAID1"]
}

Single-disk bootstrap profile:

"profile": {
  "data": ["single"],
  "metadata": ["DUP"],
  "system": ["DUP"]
}

Mixed data after interrupted balance:

"profile": {
  "data": ["single", "RAID1"],
  "metadata": ["RAID1"],
  "system": ["RAID1"]
}

The human-facing redundancy annotations from the text output, such as (no redundancy), (same-disk copies; no disk redundancy), and (not fully redundant), do not appear in JSON. The JSON payload carries only the btrfs profile names braid observed; consumers apply their own policy.

  • balance: state object (idle, running, paused, unknown)
  • last_scrub: state object (never, running, finished, aborted, interrupted, unknown). For finished, aborted, and interrupted, started_at is an offset-free host-local ISO-8601 wall-clock timestamp (YYYY-MM-DDTHH:MM:SS) as reported by btrfs. It records Scrub started, or Scrub resumed after a resumed scrub, and is not directly comparable to UTC fields such as pending-operation started_at values ending in Z. The same three states also carry error_count (integer) – the count btrfs reported, the same number the text output renders as (N errors). The scrub error details: journalctl command from the text output is not part of the JSON (mirroring the profile annotations above); a --json consumer derives its own --since value from started_at.

A complete report for a healthy 3-disk RAID1 pool:

{
  "mount_point": "/mnt/storage",
  "status": "intact",
  "total_devices": 3,
  "present_count": 3,
  "missing_count": 0,
  "profile": {
    "data": ["RAID1"],
    "metadata": ["RAID1"],
    "system": ["RAID1"]
  },
  "fsid": "f5f5f5f5-aaaa-bbbb-cccc-d0d0d0d0d0d0",
  "capacity": {
    "total_bytes": 18000000000000,
    "used_bytes": 6000000000000,
    "free_bytes": 12000000000000
  },
  "last_scrub": {
    "state": "finished",
    "started_at": "2026-05-01T03:00:00",
    "error_count": 0
  },
  "balance": { "state": "idle" },
  "allocation": [
    { "bg_type": "Data", "profile": "RAID1", "used_bytes": 6000000000000, "allocated_bytes": 6500000000000 },
    { "bg_type": "Metadata", "profile": "RAID1", "used_bytes": 8000000000, "allocated_bytes": 9000000000 },
    { "bg_type": "System", "profile": "RAID1", "used_bytes": 65536, "allocated_bytes": 33554432 }
  ],
  "disks": [
    {
      "name": "toshiba1",
      "mapper": "braid-toshiba1",
      "by_id": "/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234",
      "luks_uuid": "aaaaaaaa-1111-2222-3333-444444444444",
      "devid": 1,
      "underlying": "/dev/sda",
      "status": "present",
      "btrfs_errors": { "read": 0, "write": 0, "flush": 0, "corruption": 0, "generation": 0 },
      "smart": { "health": "ok", "protocol": "sata", "reallocated_sectors": 0, "pending_sectors": 0, "offline_uncorrectable": 0, "celsius": 26 }
    }
  ],
  "alert_active": false
}

When the pool is not mounted, every mounted-only field above (total_devices, present_count, missing_count, profile, fsid, capacity, last_scrub, balance, allocation) is omitted, leaving mount_point, status ("not_mounted"), disks ([]), and alert_active. advisories and alert_causes still follow their skip-when-empty rule, so a latched alert or a pending-operation advisory can still appear on an offline pool.

  • braid unlock – bring the pool online
  • braid replace – repair a degraded pool
  • braid remove-missing – forget a dead device (operates only on btrfs-authoritative MISSING devids; see that command’s note on transient null-underlying state)
  • braid doctor – diagnose pool/disk health and get recovery guidance
  • braid idle – machine-friendly idle/busy check for autosuspend

← braid

braid doctor

Runs diagnostic checks on your braid configuration, pool health, RAID profile consistency, LUKS headers, auto-suspend wake path, and alerting hardware. Reports issues and suggests fixes.

When to use it

  • After initial setup, to verify everything is wired correctly.
  • Periodically, to catch drift (missing disks, mixed RAID profiles, broken alert speaker).
  • When something seems wrong and you want a quick health summary.

Basic example

sudo braid doctor

Output:

[ok]   config file     /etc/braid/config.json exists and is valid JSON
[ok]   config schema   required fields present and valid
[ok]   config perms    /etc/braid/config.json permissions ok
[ok]   declared disks  all 3 declared disks present
[ok]   missing devs    no missing devices
[ok]   enospc risk     per-device unallocated space healthy
[ok]   foreign uuids   no foreign LUKS UUIDs in live pool
[ok]   data profiles   data profile: RAID1
[ok]   meta profiles   metadata profile: RAID1
[ok]   system profiles  system profile: RAID1
[ok]   meta pressure   metadata pressure within bounds
[ok]   paused balance  no paused balance
[ok]   smart selftest disk1  passed ~2 days ago
[ok]   smart selftest disk2  passed ~12 days ago
[ok]   smart selftest disk3  passed ~30 days ago
[skip] alert beep      skipped (pass --beep to play the audible alert test beep)
[skip] ups daemon      skipped (braid.ups not enabled)
[skip] braid-online    skipped (braid.ups not enabled)
[skip] wake-on-lan     skipped (braid.autoSuspend not enabled)

The SMART self-test check emits one row per pool drive. If a drive has no recent completed self-test, the row includes a paste-ready smartctl command:

[warn] smart selftest disk2  no completed SMART self-test recorded -- run: smartctl -t short /dev/disk/by-id/...

The hint uses the stable by-id path: braid’s own diagnostic read prefers the member’s live backing device, but a smartctl -t short you run later should use by-id, which survives reboots and controller reordering.

To test the real alert sound:

sudo braid doctor --beep

Machine-readable output

sudo braid doctor --json

Prints a JSON object with status (one of ok, warn, fail, skip) and a checks array. Each check has name, status, and message. Per-drive checks also include subject.

--json mode never plays the alert beep test. The check still appears in the report as skip. --json and --beep conflict; run a separate sudo braid doctor --beep when you want to test the audible alert path.

What it checks

CheckWhat it does
config_fileConfig exists and is valid JSON
config_schemaRequired fields present and deserializable
config_permissionsCanonical /etc/braid/config.json is not world-writable and is owned by root; custom --config paths skip this check
declared_disksEvery UUID-keyed pool.json member is present, is a block device, has a readable LUKS header, its live LUKS UUID matches the pool.json key, and, when the pool is mounted, is assembled into the live btrfs pool. Warn if a member is missing, is not a block device, has an unreadable LUKS header or probe failure, is present and identity-verified but not assembled into the live pool (offline), or the pool is mounted but its live topology cannot be probed to verify assembly; Fail if a member’s live LUKS UUID does not match its pool.json key.
pool_missing_devicesNo btrfs missing devices in the live pool
enospc_riskWarns when the pool is one disk-loss away from insufficient RAID1 chunk-pair space. Per-device threshold scales with pool size (min(1 GiB, 10% of total device bytes), matching the kernel’s effective data chunk size)
foreign_luks_uuidFail when the live (mounted) pool contains a btrfs device whose LUKS UUID is not declared in pool.json (a foreign disk). The message pairs each foreign UUID and its mapper with a paste-ready btrfs device remove /dev/mapper/<mapper> <mount> then cryptsetup close <mapper> recipe – the observed mapper name and pool mount point are substituted in, and multiple foreign disks each get their own recipe. Skipped when the pool is not mounted.
data_profile_mismatchData block groups all use the same RAID profile
metadata_profile_mismatchMetadata block groups all use the same RAID profile
system_profile_mismatchSystem block groups all use the same RAID profile
metadata_enospc_pressureWarns when metadata is near the next allocation threshold and fewer than two RAID1 devices have enough unallocated space for the next metadata chunk
paused_balanceWarns if a btrfs balance is paused on the mounted pool (e.g. a prior balance interrupted by reboot, manual pause, or kernel pause) and suggests resuming with btrfs balance resume <mount>.
smart_self_testOne result per pool drive: runs smartctl --json -A -l selftest <device> against each – <device> is the member’s live backing device (e.g. /dev/sda) when it is assembled into the mounted pool, otherwise its persisted by-id path (pool offline, probe failed, or that member not currently assembled – e.g. missing or hot-unplugged on a degraded mount) – then reports Fail on an active SMART self-test failure, Warn if no completed test in the last 90 powered-on days (or never), Ok otherwise, or Skip for NVMe/SCSI/unsupported drives. In --json, every per-drive result carries name: "smart_self_test" and a subject field naming the pool member; if pool membership is missing or empty, a single Skip result with name: "smart_self_test" is emitted; if pool membership is corrupt or unreadable, a single Warn result with the same name is emitted instead. In both fallbacks the subject field is omitted. Scripts should check whether subject is present before keying on it.
beep_pathPC speaker alert beep is configured; with --beep, the alert beep command succeeds
ups_daemonWith UPS enabled, upsc is available and can query the UPS daemon; missing or spawn-failed upsc is a failure, daemon unreachable/non-zero upsc is a warning
braid_online_activeWith UPS enabled and the pool mounted, braid-online.service is active so shutdown unmounts the pool. Standalone CLI installs (no NixOS module) skip this – there is no braid-online.service to verify.
wake_on_lanWith auto-suspend enabled, ethtool <interface> reports magic-packet wake support and active Wake-on: g; disabled, unsupported, missing, or unparseable WoL state is a failure

Flags

FlagEffect
--jsonMachine-readable JSON output; never plays the alert beep test
--beepPlay the audible alert test beep; conflicts with --json

Exit codes

  • 0 – all checks passed (ok/warn/skip)
  • 1 – at least one check failed

What happens under the hood

  1. Reads and validates /etc/braid/config.json.
  2. Loads UUID-keyed pool.json and probes each declared disk via cryptsetup isLuks and cryptsetup luksUUID.
  3. If the pool is mounted, queries btrfs filesystem df and btrfs device usage --raw to check RAID profile consistency and metadata allocation headroom, probes for missing devices, reconciles each live pool member’s LUKS UUID against pool.json to flag foreign devices, and runs btrfs balance status to detect paused balances.
  4. For each declared disk, runs smartctl --json -A -l selftest <device> – the member’s live backing device when it is assembled into the mounted pool, otherwise its persisted by-id path (including a member that is missing or unassembled on a degraded but mounted pool) – and parses the self-test log to detect active failures and report the age of the most recent passing entry. See ADR-024 for why present members are probed by live path rather than by-id.
  5. If the braid monitor NixOS module is configured, reports the alert beep check as skipped by default.
  6. With --beep, plays a short test beep through the canonical beep wrapper.
  7. If UPS support is enabled, checks upsc and the mounted-pool braid-online.service shutdown hook.
  8. If auto-suspend is enabled, runs ethtool <interface> to verify runtime Wake-on-LAN state.
  9. Aggregates results and prints a summary.
  • status – live pool health, disk usage, scrub status
  • monitor – automated health check for alerting

← braid

braid monitor

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Checks btrfs device error stats, missing devices, and SMART alerts. Designed to be run automatically by a systemd timer (every 5 minutes by default). Exits with a status code that drives the alert pipeline.

When to use it

You normally don’t run this by hand – the braid-monitor.timer systemd unit runs it automatically. Use it directly when debugging the alert system or testing your monitoring setup.

Basic example

sudo braid monitor

No output on success. Check the exit code:

sudo braid monitor; echo $?

Exit codes

CodeMeaning
0Healthy, pool is offline, or another braid command holds the pool lock (cycle skipped, re-evaluated on the next timer tick)
1Alert active – one or more problems detected
2Pre-monitor setup error (e.g. pool-lock I/O, config load failure)

What triggers an alert (exit 1)

  • btrfs device errors – any device in the pool has read, write, flush, corruption, or generation errors above the acknowledged baseline, including errors discovered during scrub.
  • Missing device – btrfs reports a device as missing or a pool device has a null underlying path.
  • SMART alert – smartd has written a SMART alert flag (via the braid smartd notifier).
  • Computation error – a probe, parse, btrfs device stats call, mountinfo read, acked-stats baseline load, acked-stats save during self-heal, or alert-latch load/quarantine failed. Monitor fails closed: it latches a ComputationError cause so the beeper fires and braid status shows the detail.

Flags

None. Monitor has no flags – it reads from the braid config and state files.

What happens under the hood

  1. Checks if the pool is mounted. If not, exits 0 (nothing to monitor).
  2. Runs btrfs device stats on the pool mount point.
  3. Loads the acknowledged-stats baseline (acked-stats.json) from a previous braid ack. If the file is unreadable or unparseable, monitor fails closed – it latches a ComputationError rather than firing every acknowledged cause against an empty baseline.
  4. Self-heals stale ack state before computing alerts: prunes baseline entries for devices no longer in the pool, and clears the missing-acked flag for any device that was acknowledged missing but is now present again. If the baseline changed, the updated acked-stats.json is written immediately; a write failure (e.g. EROFS, ENOSPC) is itself a fail-closed ComputationError.
  5. Computes alert causes against the reconciled baseline: btrfs device errors above the baseline, missing/null-underlying devices, and the smartd alert flag.
  6. Merges the causes into the alert latch (alert-latch.json). The latch is sticky: once an alert fires, it stays active until braid ack clears it.

Alert pipeline

braid monitor      --writes--> alert-latch.json --> braid status / braid tui (display)
(timer, every 5m)  --exit 1--> braid-alert.service (beeper + alertCommand)

smartd  --start-->  braid-alert.service (beeper)
        --writes--> smartd-alert --> next braid monitor cycle (latches SmartdAlert)

On exit 1, the braid-monitor.service wrapper starts braid-alert.service (the beeper, plus any alertCommand). After that, two things stay active until you braid ack, each held by a different mechanism:

  • The latch and exit 1 – held by monitor. Each cycle it writes the live causes to alert-latch.json, merging them into the existing latch, and re-exits 1 while any cause remains. braid status and the TUI read the same file for display.
  • The beep – held by braid-alert.service itself, not the read-back. Once started it stays active on its own (the backoff beep loop when beep is enabled, or a RemainAfterExit oneshot when it’s off), so the wrapper’s per-cycle systemctl start is a no-op and a skipped cycle (offline or lock-contended exit 0) does not silence it. The service never reads alert-latch.json or the smartd-alert flag.

smartd is a second, independent trigger: on a SMART fault it starts braid-alert.service directly and writes the smartd-alert flag, which the next monitor cycle latches as a SmartdAlert cause.

The beep stops only when braid ack clears the latch and runs systemctl stop braid-alert.service.

  • ack – acknowledge alerts and silence the beeper
  • doctor – one-time diagnostic; pass --beep to test the alert beep
  • status – shows active alerts in the status output

← braid

braid ack

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Acknowledges active alerts and silences the PC speaker beeper. When there is an active alert source on a mounted pool, it also sets the current device error counts as the new baseline so the same condition won’t re-trigger.

When to use it

  • The beeper is going off and you’ve investigated the cause.
  • braid status or braid tui shows active alerts you’ve already addressed.
  • After replacing a disk or running a scrub to clear errors.

Basic example

sudo braid ack

Output:

acknowledged 3 alerts

If there’s nothing to acknowledge:

no active alerts

What happens under the hood

  1. Reads the alert latch to determine how many alerts are active.
  2. If the pool is mounted:
    • If a latch entry exists, the smartd alert flag is present, or the latch is corrupt, snapshots the current btrfs device stats error counters and missing-device state.
    • Writes that snapshot as the new acknowledged baseline (acked-stats.json). Future monitor runs compare against this baseline, so the same error counts won’t trigger again.
    • If none of those alert sources is present, exits 0 with no active alerts and does not query btrfs or rewrite acked-stats.json.
  3. Stops braid-alert.service (the beeper), best-effort. This runs first so the stop attempt is reached before any later file-removal I/O error can short-circuit the rest of cleanup.
  4. Removes the smartd alert flag (smartd-alert) if present.
  5. Removes the alert latch file (alert-latch.json).
  6. Removes the corrupt-latch sidecar (alert-latch.json.corrupt) if present.

On a cleanup I/O error, ack preserves retry state so the next braid ack resumes cleanup after the I/O fault is fixed.

When ack reaches cleanup and a later cleanup step fails, it leaves /var/lib/braid/alert-cleanup-pending. braid status surfaces ack cleanup pending -- re-run `braid ack` to resume as an alert cause until cleanup finishes. If that sentinel is the only remaining alert signal, the next braid ack re-enters cleanup directly (no btrfs probe, no baseline rewrite) and prints acknowledged current alerts on success – expected output because only leftover cleanup ran.

If the pool is offline but alerts exist (e.g., a latched smartd alert), ack still clears the latch and flag without snapshotting device stats. Offline means there is no mount at the configured mount point. If that path is occupied by a non-btrfs filesystem, braid ack returns a probe error naming the fstype and preserves alert-latch.json, smartd-alert, and acked-stats.json.

Flags

None.

Safety checks

  • If the pool is not mounted and no alerts are latched, ack refuses with “pool is not mounted – nothing to acknowledge”
  • If the pool is mounted but healthy with no latch entries, no smartd alert flag, and no corrupt latch, ack is a no-op and does not mutate acked-stats.json
  • If the configured mount point is mounted as something other than btrfs, ack refuses with the fstype mismatch and does not clear or rewrite alert state
  • If another braid operation holds the pool lock (/run/braid-pool.lock), waits up to 10 seconds for it to finish: proceeds if the lock frees within that window, otherwise exits 1 with the pool-lock retry message.
  • monitor – the automated check that triggers alerts
  • status – view active alerts
  • tui – interactive dashboard shows alert state

← braid

braid enroll

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Enrolls a binary keyfile into LUKS slot 1 on all pool disks. Used to set up USB auto-unlock: plug in a USB drive with the keyfile, and braid unlock --key-file can open the pool without typing a passphrase.

When to use it

  • Setting up unattended unlock via USB keyfile.
  • After adding a new disk to the pool (enroll the keyfile on it too).

Basic example

Generate a new keyfile on a USB drive and enroll it on all pool disks:

sudo braid enroll /mnt/usb --generate

/mnt/usb must already exist and be mounted. This creates /mnt/usb/braid.key (4096 bytes of random data) and adds it to LUKS slot 1 on every disk in the pool. You’ll be prompted for the pool passphrase.

Common variations

Enroll an existing keyfile (already at /mnt/usb/braid.key):

sudo braid enroll /mnt/usb

Non-interactive (passphrase from stdin):

echo -n 'my-passphrase' | sudo braid enroll /mnt/usb --generate --passphrase-stdin

Passphrase from a file:

sudo braid enroll /mnt/usb --generate --passphrase-file /root/passphrase.txt

Dry run (preview what would happen):

sudo braid enroll /mnt/usb --generate --dry-run

Flags

FlagEffect
--generateCreate a new 4096-byte random keyfile before enrolling; the target directory must already be a mount point
--passphrase-stdinRead passphrase from stdin instead of TTY prompt
--passphrase-file <path>Read passphrase from a file instead of TTY prompt (conflicts with --passphrase-stdin)
--dry-runShow what would happen without making changes

What happens under the hood

  1. Checks for a pending operation journal (refuses if one exists).
  2. With --generate: Validates that the target directory exists, is a directory, is already a mount point, and does not already contain braid.key (if a prior --generate run was interrupted, drop --generate and re-run to finish enrolling the existing keyfile; otherwise remove it manually first).
  3. Without --generate: Validates that DIR/braid.key already exists and is a regular file.
  4. Scans pool membership for present LUKS disks. Absent or non-LUKS disks are skipped with a message. If a present disk’s live LUKS UUID does not match the UUID recorded in pool.json – the disk was swapped, cloned, or reformatted – enrollment aborts before any passphrase prompt or slot change; detach the foreign disk and reattach the original, or run braid replace if the swap was intentional.
  5. Verifies the passphrase against every present pool disk before any keyfile probe.
  6. Without --generate: Probes the keyfile against each disk. If it authenticates, reports “already enrolled” and skips that disk for the rest of enrollment. A rejected probe means the disk still needs enrollment; any other probe failure (e.g. device busy) aborts immediately rather than treating the disk as un-enrolled.
  7. For each disk still needing enrollment, checks LUKS slot 1: proceeds if free; refuses with an error if occupied by an unknown key (you must remove it first with cryptsetup luksKillSlot).
  8. With --generate: Only after all preflight checks pass, generates the random keyfile.
  9. Enrolls the keyfile into LUKS slot 1 on each disk.
  10. Creates a LUKS header backup for each modified disk.

See Pending LUKS header backups – copy each .luksheader off-system and delete the local copy.

Safety checks

  • Refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses if a present disk’s live LUKS UUID no longer matches its pool.json record – the disk was swapped, cloned, or reformatted; detach the foreign disk and reattach the original, or run braid replace if the swap was intentional. This UUID check is repeated at the mutation boundary, after the passphrase is read and before any keyfile is enrolled, so a disk swapped during the passphrase prompt is still caught before slot 1 is touched.
  • With --generate, refuses unless the target directory is already a mount point.
  • Passphrase is verified before any mutations.
  • Slot 1 conflicts are detected before the keyfile is generated, so you never end up with an orphan keyfile.
  • With --generate, refuses if braid.key already exists at the target path; if a prior --generate run was interrupted, drop --generate and re-run to finish enrolling the existing keyfile.
  • Without --generate, refuses if the keyfile doesn’t exist.
  • Idempotent: if the keyfile is already enrolled on a disk, that disk is skipped.
  • unlock – use --key-file to unlock with the enrolled keyfile

← braid

braid discover

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Scans /dev/disk/by-id/ for LUKS devices with braid-* labels, reads their LUKS UUIDs, and reconstructs UUID-keyed pool membership. This is a repair tool for recovering a lost or corrupt pool.json.

When to use it

  • Your pool.json was deleted or corrupted.
  • You’re migrating disks to a new machine and need to rebuild pool state.

The normal path for adding disks is braid add. Use discover only when pool.json is missing or corrupt – it refuses to run while a valid pool.json exists. To see the disks already in a healthy pool, use braid status.

Basic example

When pool.json is missing, preview the membership discover would rebuild before saving it (no changes):

sudo braid discover

Output:

  ironwolf = /dev/disk/by-id/ata-ST12000VN0008_XXXXXXXX
  toshiba = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_XXXXXXXX
pass --write to save to /var/lib/braid/pool.json

Bare discover prints this preview only when pool.json is absent. Over a valid pool.json it exits with an error – use braid status to view current membership. Over a corrupt pool.json it also refuses, pointing you to discover --write (see Common variations).

The membership rows are written to stdout; the pass --write to save hint, the --write “pool membership written” confirmation, scan warnings, and errors go to stderr. So braid discover > members (or braid discover | grep <disk>) captures only the rows.

Common variations

Write the discovered membership to pool.json:

sudo braid discover --write

If you can name the expected member count ahead of time, pass it as a fail-closed guard against a detached disk or stray braid-labeled disk:

sudo braid discover --write --expect-count 3

Flags

FlagEffect
--writePersist the discovered membership to pool.json
--expect-count <N>With --write, refuse to write if the discovered member count is not exactly N

What happens under the hood

  1. With --write, refuses if a pending operation journal (pending-op.json) exists. Bare discover is read-only and skips this gate.
  2. Refuses over an existing UUID-keyed pool.json (bare and --write). A corrupt or off-schema pool.json is the documented rebuild path: bare discover prints the rebuild remediation, and discover --write writes a forensic pool.json.corrupt-<RFC3339-UTC> snapshot adjacent to the new file, then rebuilds. If the snapshot cannot be written (full disk, read-only state directory), discover --write refuses rather than destroy the corrupt original.
  3. Reads all entries in /dev/disk/by-id/ in sorted filename order, skipping partition entries (e.g., ata-TOSHIBA-part1). Sorting up front makes label-collision reporting (step 10) independent of read_dir order.
  4. Resolves each by-id symlink to its canonical kernel device. Skips with a cannot canonicalize warning when the symlink is dangling (e.g., udev didn’t clean up after a disk removal).
  5. For each entry, runs cryptsetup isLuks to check if it’s a LUKS device.
  6. Runs cryptsetup luksDump to read the LUKS label, version, and UUID.
  7. Skips LUKS1 devices (braid requires LUKS2).
  8. Matches labels of the form braid-<name> and extracts the disk name.
  9. Uses the canonical kernel device resolved above to detect multiple /dev/disk/by-id/ symlinks for the same physical disk (i.e. wwn- and ata- aliases), then picks the most stable one (preference order: wwn > nvme > scsi > ata > usb > other, with lexicographic tie-breaking).
  10. If two symlinks that share the same braid-<name> label resolve to different kernel devices, refuses the entire scan with an error. Two physically distinct disks share a label – typically after a dd clone or a manual mislabel – and braid cannot safely choose one. Relabel or detach one disk before retrying.
  11. If two distinct devices share one LUKS UUID, refuses the entire scan. This usually means a cloned disk is attached.
  12. With --write, saves the discovered UUID-keyed membership to pool.json.

Safety checks

  • Refuses any operation on an existing UUID-keyed pool.json. Corrupt or off-schema files are allowed for --write rebuild only; the original is copied to pool.json.corrupt-<RFC3339-UTC> before overwrite, and --write refuses if that snapshot cannot be written (full disk, read-only state directory). Run with all intended pool members attached; see docs/internals/luks-unlock.md.
  • With --write, refuses if a pending operation journal (pending-op.json) exists – run braid recover to reconcile.
  • With --write, refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • With --expect-count, refuses to write if the discovered member count is not exactly the requested count.
  • Without --write, makes no changes at all – read-only scan that takes no pool lock and does not consult the pending-op journal.
  • Dangling /dev/disk/by-id/ symlinks are skipped with a warning – a diagnostic operators need when udev leaves a stale alias behind after a disk swap.
  • LUKS1 devices are skipped with a warning.
  • If no braid-labeled LUKS2 devices are found, discover exits 1 with no braid-labeled LUKS2 devices found -- ... (both bare and --write) – check the intended members are attached, readable, and labeled braid-<name> as LUKS2. An array that is entirely LUKS1, detached, or unreadable lands here, with any present-but-skipped disk warned about above.
  • Refuses the scan if two distinct devices share the same braid-<name> LUKS label – relabel or detach one disk before retrying.
  • Refuses the scan if two distinct devices share the same LUKS UUID – detach the cloned or unintended disk before retrying.
  • recover – resume an interrupted operation (has its own membership rebuild from live pool state)
  • status – view current pool membership

← braid

braid recover

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Resumes from an interrupted operation (add, remove, replace) by opening LUKS devices, mounting the pool, rebuilding pool.json from live pool state when appropriate, running owed maintenance when the btrfs balance state is idle, and clearing the pending-operation journal only after the safe recovery path completes.

When to use it

  • After a crash, power failure, or interrupted braid command.
  • When braid status or other commands show “pending operation – run braid recover”.
  • Only available when pending-op.json exists.

Basic example

sudo braid recover

You’ll be prompted for the pool passphrase. Output shows the recovery process:

Recovering from interrupted "add" operation (started 2026-03-15T14:30:00Z)...
pool.json written from completed add membership.
pool.json written from committed add membership.
pending-op.json cleared. Recovery complete.

Before the pool.json lines, a real run prints either per-disk LUKS-open and mount rows (if the pool was offline) or a single pool already mounted at ... row (if it was already mounted). On the idle/no-paused owed RAID1 path, after the committed line it prints a RAID1 soft-balance replay row pair before the final pending-op.json cleared line. If the balance check is paused, running, or unknown, recover fails before the replay row and does not clear the journal.

Important

If recover refuses owed RAID1 replay because btrfs balance state is paused, running, or unknown, it left pending-op.json in place. Inspect btrfs manually before clearing recovery state.

Common variations

Non-interactive (passphrase from stdin):

echo -n 'my-passphrase' | sudo braid recover --passphrase-stdin

Passphrase from a file:

sudo braid recover --passphrase-file /root/passphrase.txt

Recover with a missing disk (degraded mode):

sudo braid recover --allow-degraded

Preview what would happen:

sudo braid recover --dry-run

Flags

FlagEffect
--passphrase-stdinRead passphrase from stdin instead of TTY prompt
--passphrase-file <path>Read passphrase from a file instead of TTY prompt (conflicts with --passphrase-stdin)
--allow-degradedAllow mounting with missing devices (redundancy is reduced until you replace the missing device)
--dry-runShow what would be done without making changes
--progress auto|always|neverControl progress display (default: auto)

What happens under the hood

  1. Loads pending-op.json (refuses if absent – nothing to recover).

  2. Chooses the mount membership from the journal phase. Existing-pool add and remove-missing PoolMutation phases mount from the pre-operation membership. Add, remove-missing, and replace post-maintenance phases mount from the committed target membership. Replace PoolMutation, bootstrap add PoolMutation (the first disk, whose pre-operation membership is empty), and Remove mount from the admission membership (pre-operation snapshot plus target-only members) – for replace this matters because the kernel may still be completing dev_replace.

  3. Opens LUKS devices and mounts the pool (or reuses the existing mount if already mounted). Exception: a Replace::PoolMutation journal on an externally-mounted pool is refused (see Safety checks); replace post-maintenance recovery on an already-mounted pool is allowed.

  4. For Replace::PoolMutation only, if a kernel-resumed btrfs replace is in progress, waits for it to finish.

  5. For Replace::PoolMutation only, if the pool was just mounted by this recover run, performs a full relock-and-remount cycle (umount, btrfs device scan --forget, close LUKS, reopen, remount) to ensure the kernel’s in-memory device topology matches the on-disk state.

  6. Probes the live pool to discover actual membership.

  7. For interrupted existing-pool add PoolMutation, first runs a non-destructive Add target reconciliation pass: any journaled add target whose underlying disk is physically present and LUKS-openable is opened, scanned, and followed by a live-pool re-probe. Targets that turn out to be live pool members are adopted into the recovered pool.json without wipefs or btrfs device add.

  8. For add PoolMutation, replays only journaled targets that are not already live. RecoverableBraidLabeled targets are replayed via wipefs --all --types btrfs plus btrfs device add -f after LUKS UUID and visible-FSID checks. FreshLuks targets that are physically present are replayed from the journaled format options, skipping format if the disk already has the expected LUKS label; if the journal carried enroll_key_file, the keyfile is re-enrolled, then the LUKS header is backed up, the mapper is opened, and btrfs device add runs without -f. FreshLuks targets that are physically absent or carry an unexpected LUKS label make recover fail and leave pending-op.json in place so the disk can be reattached or replaced and recovery rerun.

    See Pending LUKS header backups – copy each .luksheader off-system and delete the local copy.

  9. For add PostAddBalanceRaid1, does not format, enroll, back up headers as target prep, wipe, or add disks. It only validates the committed live pool and runs the owed RAID1 balance when btrfs balance state is idle; a paused, running, or unknown balance state fails closed with the journal preserved.

  10. For replace and remove-missing PoolMutation, detects whether the primary btrfs membership mutation committed. If it did not commit, recover restores/keeps the pre-operation pool.json, clears the journal, and tells you to rerun the original command. It does not rerun btrfs replace start or btrfs device remove.

  11. For replace and remove-missing post-maintenance phases, validates committed live membership, repairs pool.json if needed, and finishes only owed maintenance such as resize or, when btrfs balance state is idle, soft RAID1 balance; it does not rerun the primary btrfs membership mutation. A paused, running, or unknown balance state before owed RAID1 replay fails closed with pending-op.json preserved.

  12. Resolves /dev/disk/by-id/ paths from live LUKS UUIDs, using btrfs devid only for missing or null-underlying bindings (not from the journal’s by-id path, which may be stale).

  13. Writes or repairs pool.json only after the journal phase allows it and live membership is complete.

  14. Clears pending-op.json only after membership is complete and any owed balance work is done.

Safety checks

  • Refuses if no pending-op.json exists.
  • Refuses if another braid operation is in progress (pool lock /run/braid-pool.lock is held) – retry once it finishes.
  • Refuses to adopt live pool members outside the recovery admission membership for the current journal phase (guards against devices added outside braid). Most phases admit the pre-operation snapshot plus target-only members; Replace::PostReplaceMaintenance admits only the committed target membership because btrfs preserves the old device’s devid on the replacement after commit.
  • Hard-fails if a live pool device has no /dev/disk/by-id/ symlink (recovery can’t guess a stable identifier).
  • Detects interrupted bootstrap add (first disk, no filesystem yet) and gives specific wipe-and-retry instructions instead of a confusing mount error.
  • Refuses to overwrite pool.json or clear pending-op.json if the post-mount probe at the configured mount point sees the pool unmounted or with zero btrfs devices. The mount may have been removed externally between recover’s mount step and membership probe; pool.json and pending-op.json are both preserved – investigate the mount, then re-run braid recover.
  • For existing-pool add recovery, refuses to clear the journal while any journaled add target is missing from the live pool.
  • Returned-disk replay may need a pool passphrase even when the pool is already mounted, because the mapper for the journaled target may still be closed.
  • Without --allow-degraded, refuses to mount if devices are missing (exit code 2 for degraded-refused, distinguishing it from other errors).
  • Refuses to recover Replace::PoolMutation when the pool is already mounted (admin-mounted, circumventing braid’s pending-op preflight on unlock). The kernel may have resumed an interrupted dev_replace on that mount session, leaving stale in-memory device state that recover cannot scrub without unmounting – which it will not do on a mount it does not own. Remediation: sudo braid lock; sudo braid recover.
  • status – shows pending operation state and prompts you to recover
  • discover – rebuild UUID-keyed pool.json from LUKS labels and UUIDs (when there’s no journal)
  • unlock – normal unlock (when no journal exists)

← braid

braid tui

Interactive terminal dashboard showing pool state, disk health, allocation, scrub status, and active alerts.

When to use it

  • Quick visual overview of your NAS health.
  • Checking disk-level detail (LUKS cipher, SMART health, error counts, transport).
  • Monitoring during or after a scrub.

Basic example

sudo braid tui

Demo mode

Try the TUI without a real pool (no config or btrfs required, no root required):

braid tui --demo

Demo mode shows three fake disks with sample data, useful for exploring the interface.

Flags

FlagEffect
--demoRun with fake data (no config, btrfs, or root required)

Keybindings

KeyAction
qQuit
rReload pool data now
TabNext tab
Shift-TabPrevious tab
j / kSelect next/previous disk (Data/Scrub) or move within the focused Browse region
h / lMove left/right across Browse regions
Ctrl-D / Ctrl-UPage Browse content down/up (one screen at a time)
EnterOpen disk detail popup (Data) or drill into Browse content
EscClose disk detail popup or return from Browse drill-in
?Toggle help overlay
Shift-RReset session temperature hi/lo watermarks

What it shows

Main view – pool status, mount point, the Profile summary (data <X> | meta <Y> | system <Z>, where each value is the profile name verbatim for a single recognized profile such as RAID1, DUP, or single; partial when that block-group type spans more than one profile; the raw profile name verbatim for an unrecognized profile like RAID5; or unknown only when no block groups of that type were reported), capacity bar, balance state, and active alerts and advisories.

Refreshing – while the pool is mounted, pool, disk, scrub, and alert data refresh automatically about every 10 seconds and immediately when you press r. While the pool is not mounted, that data stays manual-only via r. When enabled, Fans and UPS telemetry also refresh automatically every 5 seconds and immediately on r. The footer’s Reload: r spinner and idle (Xms) duration reflect pool refreshes, including automatic pool refreshes; automatic Fans/UPS polls do not update it. The view redraws periodically while idle so relative ago times stay current.

Disk table – one row per disk: number, name, bus (sata/usb/nvme), SMART health, temperature, btrfs device-error count, and allocated (shown as percent used and allocated/size).

Fans (when fan control is enabled) – Data-tab row with a daemon: header annotation for hddfancontrol-braid.service: active is green, activating and inactive are yellow, failed is red, and unknown is gray. The annotation is not a column; the columns are PWM (raw/255 plus percent), RPM, Driving (the hottest drive and its temperature), and Curve. See the fan control guide.

UPS (when UPS support is enabled) – Data-tab row with the same daemon: header annotation for the NUT daemon. The columns are Status (color-coded flags), Battery, Runtime, and Load. See the UPS guide for Status severity.

Disk detail popup (press Enter on a disk) – disk name, LUKS lock status, cipher, key size, keyslot count, an allocations table (type/profile/size plus unallocated), the btrfs device-error breakdown (read/write/flush/corruption/generation), and a SMART section with the health verdict plus its supporting evidence rows (per-protocol: SATA reallocated/pending/uncorrectable, or NVMe critical-warning/media-errors/available-spare/percentage-used). A row for an out-of-spec attribute is colored red. Temperature is not repeated here – it has its own column in the disk table.

Tabs – three tabs, switched with Tab / Shift-Tab:

  • Data (default) – pool allocation breakdown, disk table, capacity bar, plus Fans and UPS rows when enabled.
  • Scrub – per-device scrub state, progress, and timing.
  • Browse – raw CLI output inspector across five tool families: Btrfs, NUT (UPS), Systemd, SMART (smartctl), and lsblk. Btrfs views include filesystem usage/show/df/commit-stats, device usage/stats, subvolumes with drill-in plus raw full/snapshot/deleted/default views, scrub status/limits, balance status, quota status/qgroups, and inspect-internal chunks. UPS views include status, raw variables, supported instant commands, connected clients, settable variables, and UPS discovery. Systemd views include unit status, show, braid units, failed units, timers, and mounts. SMART views include device scan, health, info, attributes, and self-test/error logs. lsblk views include tree, filesystems, disks, all-columns, and SCSI. NUT > UPSes can help find the correct ups.name before UPS support is enabled.
  • status – non-interactive pool health output
  • ups status – non-interactive UPS state output

← braid

braid ups status

Note

Experimental 🧪

This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.

Query the UPS (NUT) daemon for the currently configured UPS and render a curated human summary or the serialized parsed model as JSON.

Requires UPS support enabled (braid.enable = true and braid.ups.enable = true). With UPS disabled the command prints an enable hint and exits 0 (not an error).

Basic example

sudo braid ups status

Output:

UPS: ups
Status: OL
Battery: 100%
Runtime: 30m 0s
Load: 17% (56 W estimated)
Input: 120.0 V (transfer 88-142 V)
Device: APC Back-UPS ES 550G
Battery manufactured: 2023/04/12
Last test: Done and passed

JSON output

sudo braid ups status --json | jq .

Emits the serialized UpscOutput model. A success body (no top-level error) is trustworthy telemetry: braid faithfully serialized whatever upsc reported. It is not a claim that the UPS is online – on-battery (OB), low-battery (OB LB), and all-unrecognized status sets are all success bodies with no error and no warning.

To judge UPS state, read status_flags: utility power is proven only by the presence of OL with no blocking flag (OB, LB, TESTFAIL, COMMBAD, FSD) – the same affirmative-OL criterion braid’s own mutation preflight uses (see the UPS guide).

Shape:

{
  "status_flags": ["OL"],
  "battery": {
    "charge_pct": 100,
    "runtime_secs": 1800,
    "voltage": "27.0",
    "type": "PbAc",
    "mfr_date": "2023/04/12",
    "runtime_low_secs": 120
  },
  "load_pct": 17,
  "realpower_nominal_watts": 330,
  "input": {
    "voltage": "120.0",
    "transfer_low": "88",
    "transfer_high": "142",
    "sensitivity": "medium"
  },
  "test_result": "Done and passed",
  "device": {
    "model": "Back-UPS ES 550G",
    "mfr": "APC",
    "serial": "3B1234X56789",
    "type": "ups"
  },
  "extra": { "driver.name": "usbhid-ups", "battery.charge.low": "10" }
}

In a success body (the shape above – a reachable UPS, no top-level error), every typed field is always present: a scalar the driver did not report serializes as null rather than being omitted, and the battery, input, and device objects are always present even when all of their fields are null. Test typed fields for a null value, not for a missing key – a has(...) check on any typed key always returns true. status_flags and extra are always present but never null ([] and {} when empty). The only field omitted when absent is the top-level warning (see the table below). Error bodies are the exception – they carry error/detail and none of the typed keys, so a script must confirm there is no top-level error before relying on the rule above.

status_flags lists flags in first-seen ups.status token order (whitespace normalized, duplicate tokens dropped); braid does not sort them, so the order is deterministic for a given UPS state.

extra is a string-keyed map of every upsc line that did not land in a typed field above. Its contents vary with the NUT driver and version (typically driver.* debug keys plus other untyped fields like battery.charge.low or input.voltage.nominal), and values are kept verbatim as strings.

Distinct sentinels cover the common non-OK cases:

ConditionJSONExit code
UPS reachable with populated ups.statusserialized UpscOutput0
UPS reachable but ups.status emptyserialized UpscOutput plus "warning": "ups_status_empty"0
UPS query failed{"error": "query_failed", "detail": "exit <code>: <stderr>"}1
UPS invocation failed (upsc could not run – missing on PATH, killed by signal, or other runner-level failure){"error": "invocation_failed", "detail": "command failed: upsc ups: <reason>"}1
UPS not enabled{"error": "ups_not_enabled"}0

If error or warning is present, do not treat the typed body as healthy UPS state. For these cases, --json writes only to stdout – stderr stays silent so the JSON sentinel can be piped into a single sink (jq, tee, CI logs) without a redundant human error line. Other failure modes, such as malformed config, still print a human error to stderr.

The converse does not hold: the absence of error and warning does not by itself mean the UPS is online – inspect status_flags as above. ups_status_empty fires only when ups.status is empty or missing (no flags to read), so it is not a general health signal.

Flags

FlagEffect
--jsonEmit parsed upsc model as JSON; stable shape for scripts
  • UPS guide – shutdown path, preflight refusal, v1 limitations
  • tui – the TUI’s Data tab shows the same live UPS state
  • doctor – UPS-adjacent configuration checks

Principles

Canonical invariants for braid. Each principle is authoritative — if code or config contradicts a principle, the code is wrong.

1. Resilient by default

Data drives never block boot. The pool is unlocked and mounted by explicit CLI invocations (braid unlock, the braid-auto-unlock.service unit, or braid recover during recovery), not by systemd mount units. No LUKS or btrfs units are generated at build time. Degraded mounts require explicit --allow-degraded — braid refuses to silently run with zero redundancy. Why →

2. CLI-owned membership

Disk membership is runtime state owned by the CLI, stored in /var/lib/braid/pool.json. Adding or removing a drive is braid add name=/dev/disk/by-id/... — no nixos-rebuild required. The NixOS module provides the mount point, services, and toolchain; the CLI owns which disks are in the pool. unlock requires pool.json to exist and be valid — it never creates or repairs it. Recovery is explicit via braid discover --write. Why →

pool.json is a best-effort operational snapshot — it tells braid which drives to attempt unlocking, not what the pool actually looks like. Any state that can be read from live btrfs (devids, device counts, FSID) must come from btrfs, not pool.json. Commands like status must never surface pool.json-sourced devids; for display authority, devids are authoritative only when read from a mounted filesystem via btrfs device usage or equivalent. Persisted DiskMember.devid carries prior-binding authority only: when live btrfs reports a device by devid alone (the null_underlying mapper case and the btrfs missing_devids case), the persisted devid is the authorized fallback binding for re-attaching that live device to its membership entry. This is not a display-side use of pool.json devid; status output continues to draw devids from live btrfs. Why →

3. Safe-by-construction operations

  • Each intent command (add, remove, remove-missing, replace) does exactly one thing with risk-appropriate confirmation. replace always uses btrfs replace start — for live disks it replaces in-place, for missing disks it rebuilds from RAID redundancy using the missing device’s devid. remove-missing cleans up a stale missing-device entry; it never rebuilds data onto a new device (that is replace). When clearing the last missing device with ≥2 devices remaining, both remove-missing and replace (missing path) run a follow-up soft balance to restore RAID1 profiles for chunks written during degraded operation.
  • Post-commit persist with journal: mutating commands write a pending-operation journal (pending-op.json) with pre/target membership snapshots before the first irreversible disk operation. pool.json is written once the btrfs membership change has committed, so it reflects committed live membership, not necessarily completion of follow-up maintenance such as RAID1 rebalance or resize. Phased journals advance to post-maintenance after the committed pool.json write; those post phases must never rerun the primary btrfs membership mutation. The journal is cleared only after the entire lifecycle succeeds, including required post-mutation maintenance like soft balance. While the journal exists, braid recover replays owed maintenance when btrfs balance state is idle and fails closed with the journal preserved when owed RAID1 replay finds a crash-paused, running, or unknown balance state. If braid crashes or fails mid-operation, the journal triggers recovery mode: membership/mount/key-enrollment commands (add, remove, remove-missing, replace, unlock, enroll, discover --write) hard-fail; read-only diagnostic and cleanup surfaces (status, doctor, lock, bare discover) stay available. braid recover rebuilds membership from the live mounted pool (not LUKS label scanning) and is the only command that clears the journal.
  • Environment-side resource acquisition (file locks, sleep inhibitors, dbus/logind handshakes, external service availability) must happen before journal::write_journal. The journal write commits the user to recovery mode on any subsequent failure, so a pure environment failure (logind unreachable, flock contention) must not leave a stranded pending-op.json for what was conceptually a “command never started” failure. The journal write is the line of no return; reorder code so any RAII guards or environment probes that can fail are bound above it. The per-command pre-journal excluded scope (which also covers reversible validation and identity checks) is enumerated in ADR 019.
  • Disk names are immutable once recorded in pool membership; name rename/reassignment is rejected by mutating commands and must use explicit replace or remove+add workflows.
  • mkfs.btrfs is gated on bootstrap only – bootstrap accepts only disks classified as fresh non-LUKS during add planning, and the LUKS open helpers verify that any pre-existing braid-<name> mapper is backed by the requested by-id disk before pool creation proceeds. mkfs.btrfs is invoked without -f so its own libblkid signature check is the final fail-closed guard.
  • An existing LUKS device or pool member is never reformatted — a multi-layer identity check (LUKS label match, LUKS UUID cross-check against pool.json, pool-mounted requirement, btrfs FSID comparison) prevents accidental data loss, with the btrfs superblock guard as defense-in-depth.
  • Failed unlock and recovery mount paths close only LUKS mappers braid newly opened during that invocation. They never close pre-existing operator-owned mappers, including mappers that become already open between planning and execution.
  • Mounts always include skip_balance — btrfs silently resumes interrupted balances on mount by default, which can re-trigger ENOSPC or surprise the user with heavy I/O. braid manages balance lifecycle explicitly; unlock warns if a paused balance is detected.
  • The bare pool mountpoint is sealed immutable (chattr +i) while the pool is offline, so a process writing it before mount fails with EPERM instead of silently landing on the root filesystem and being shadowed when the pool mounts. The seal is always-on (no knob), lives only in the boot/activation unit (braid-seal-mountpoint), and persists across lock/unlock. Why →
  • Dry-run previews for migrated mutating commands are rendered from the same typed work plans that execution consumes; Step is output-only. Why ->
  • Why →

4. Single passphrase

All drives share one LUKS passphrase. braid unlock and braid add depend on this — one passphrase unlocks all drives. Before any irreversible operation, every reachable existing LUKS device that will remain in or enter post-operation pool membership has its slot 0 verified. Fresh-format disks are excluded because they have no existing slot 0. The live-replace source is excluded when other retained members exist, so a divergent slot 0 on the disk being replaced does not block its own replacement. The same all-relevant-disk rule applies to keyfile credentials used by mount, unlock, and recover. Why →

Binary keyfile support is available via braid enroll (slot 1) and braid.autoUnlock (NixOS module). The passphrase (slot 0) is the default interactive-unlock mechanism; the slot-1 keyfile drives braid.autoUnlock for unattended boots and can also be passed directly to braid unlock --key-file.

5. Stable identifiers

All persistent storage config uses /dev/disk/by-id/ paths. Never /dev/sdX. Mapper names are braid-<disk-name> (e.g., braid-toshiba) — deterministic, human-friendly, debuggable in lsblk, systemd logs, and error messages. LuksUuid is the primary persistent identity for code; the disk name and the LUKS label are presentation; by_id is for hardware addressing. When the live LUKS UUID is unobservable for a device the kernel/btrfs still reports (null_underlying mapper, btrfs missing_devids), btrfs devid is the only authorized live-fallback binding key. No code path may decide membership, target a device, or correlate live pool state by parsing a name out of a mapper path or LUKS label, except in two narrow cases: discover bootstrapping a UUID-keyed membership from cold disks, and returning-disk adoption safety in add (the PresentLuks path may gate adoption on label match, but identity correlation still uses LuksUuid/devid/FSID). Why →

6. btrfs RAID1

Auto-healing checksums, dynamic drive pooling, in-kernel (no out-of-tree modules). 50% space overhead is accepted. btrfs RAID5/6 is not production-ready. Why →

7. Sane defaults

If a knowledgeable admin would always enable it, braid enables it by default. Use lib.mkDefault for simple pass-through defaults on stable NixOS options. Wrap in a braid.* option when the feature is inside braid’s product boundary and benefits from lifecycle control, discoverability, or a unified config surface – even if the mapping is 1:1. Examples: braid.autoScrub (periodic scrub with lifecycle binding to pool online state), poolAccessGroup for mount root access (root:storage 2770). Why ->

8. Test every design decision

NixOS VM tests validate behavior, not just command success. TDD: write failing tests first, confirm they fail for expected reasons, then implement.

9. NixOS-native

Braid only targets NixOS. No portability abstractions, no generic Linux fallbacks. Follow NixOS module conventions — same option types, patterns, and idioms as nixpkgs. When in doubt, nixpkgs is the tiebreaker. Why →

10. Pinned toolchain

Parser-critical tools (btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, ethtool) are pinned to a specific NixOS stable release via the flake input. Wrappers execute with an explicit PATH built from module-controlled packages (braid.packages.*). Parsers assume the output format of the pinned version – upgrading those tools requires updating fixtures and parser tests. These pinned defaults are a compatibility baseline, not a lock; users may override braid.packages.* to pick up newer system versions when needed. Generic helpers (coreutils, systemd) come from the consumer’s package set and are not part of braid’s parser contract, except that Browse parses systemctl list-units --output=json as a tolerant UI-only picker with raw-output fallback. Why ->

11. HDD defaults

Mount options, LUKS flags, and scrub scheduling are chosen for HDD NAS deployments. Why →

12. One pool operation at a time

Rust dispatch acquires /run/braid-pool.lock before loading config, loading pool.json, probing pool state, or prompting for command input. The authoritative command-to-lock-discipline mapping lives in lock_policy in cli/src/main.rs; its wildcard-free exhaustive match makes every Commands variant choose a discipline at compile time.

Lock disciplines are policy categories, not prose-maintained command lists. Interactive mutators acquire non-blocking and fail fast with braid: another braid operation is already in progress so the user can retry once the active operation completes. Short-contention maintenance paths may wait for a bounded timeout, such as the 10-second alert acknowledgement window. Timer-driven monitoring may exit 0 silently on contention because a missed cycle is harmless and exit 1 would falsely start alert notification. Read-only paths and dry-run modes do not acquire the lock; bare discover is read-only, while its write mode participates because the scan -> pool.json write window must be serialized against pool-state mutators. Read-only diagnostics status and doctor never acquire the lock so operators retain a working diagnostic surface during contention; tests/module/pool-lock-readonly-bypass.py pins this invariant.

Mutual exclusion is enforced at the critical section itself, not via systemd unit topology. Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on the current locked state rather than stale pre-lock observations.

13. Announce long-running work

Every interactive command emits a [wait] row before any subprocess that can stall the terminal long enough for the user to wonder whether the CLI has hung. The bound categories:

  • cryptsetup Argon2 operations (luksFormat, luksOpen, luksAddKey, --test-passphrase);
  • cryptsetup close (single attempt or busy-retry loop);
  • btrfs balance, replace, and device remove (potentially hours);
  • mount and umount (kernel can drain in-flight I/O / replace workers / inhibitors).

A [wait] row is closed by one of:

  • the same command’s paired success row ([ok] {same subject}: ...) on the success path,
  • a same-subject [fail] row on a known failure path (e.g. lock.rs’s umount failure),
  • a same-subject [warn] row on a non-fatal best-effort failure (e.g. mapper_close::close_mapper_best_effort’s LUKS close, or wait_for_kernel_replace_to_finish’s status-poll error — the command continues despite the failure, and the warn row tells the user the wait window is closed without success),
  • a same-subject [skip] row on a successful negative or no-op probe (e.g. braid enroll’s pre-mutation keyfile probe finding the keyfile not yet enrolled — the work the wait announced completed, the answer is “no work yet”),
  • or the command’s normal error output (MountError / LuksError / PoolError propagation) on uncaught error paths.

A [wait] followed by none of these closers (i.e., success, fail, warn, skip, or non-zero exit) is a documentation bug.

Fast bookkeeping that completes well under a second (mkfs.btrfs on a fresh disk, btrfs device add, btrfs filesystem resize, btrfs device scan, btrfs device scan --forget, cryptsetup luksHeaderBackup, cryptsetup status, blkid, JSON parses, journal writes, pool.json saves, sysfs reads) does not warrant a row.

Rendering uses status_tag::status_line(StatusTag::Wait, ...) against color_enabled_for_stderr() so plain stderr captures contain unwrapped [wait] bytes and TTY output picks up the gray ANSI tag. Why →


Implementation workflow and conventions are in AGENTS.md at the repo root.

Decision: btrfs RAID1

Principle: btrfs RAID1

Context

The NAS needs checksumming (bit rot detection), self-healing (automatic repair from redundant copy), and dynamic drive pooling (add/remove drives without reformatting). The filesystem sits on top of LUKS.

Options considered

ZFS raidz

  • Checksumming + self-healing + RAID. Mature and well-tested.
  • Rejected: out-of-tree kernel module. Licensing conflict means it can never be mainlined. NixOS supports it but it’s a second-class citizen — kernel updates can break the module, and the build dependency is heavy.

btrfs RAID5/6

  • Same benefits as RAID1 with less overhead (parity instead of mirroring).
  • Rejected: not production-ready. The write-hole bug has been a known issue for years. Data loss reports exist. The btrfs wiki explicitly warns against it.

SnapRAID + mergerfs

  • Parity-based protection with independent drives. ~75% space efficiency with 3+1.
  • Rejected: no auto-healing. SnapRAID syncs on a schedule (e.g., nightly). Bit rot between syncs is undetected. No checksumming on read. Drives are independent ext4 — good for recovery but no real-time protection.

btrfs RAID1

  • Checksums every block on read, heals from the RAID1 copy automatically. Dynamic pool — btrfs device add/remove at any time with any size drive. In-kernel, first-class NixOS support. Simple stack: LUKS + btrfs.
  • Accepted.

Decision

btrfs RAID1. The 50% space overhead is accepted as the cost of real-time auto-healing with a simple, in-kernel stack.

Tradeoffs accepted

  • 50% space overhead — 3x 12TB = ~18TB usable. Parity schemes would give ~24TB.
  • Fixed 2-way redundancy — btrfs RAID1 keeps exactly 2 copies of every block, regardless of pool size. A 3- or 4-drive pool tolerates one drive failure, the same as a 2-drive pool. Additional drives buy usable capacity, not extra fault tolerance. Higher-redundancy profiles (RAID1C3, RAID1C4) exist in btrfs but are not used by braid — the product’s redundancy story is “tolerate one drive failure.”
  • No drive independence — drives are part of a btrfs pool, not individually mountable. Recovery requires a working btrfs toolchain.
  • Rebalancing cost — adding or removing a drive triggers a balance operation that can take hours on large pools.
  • Incremental growth — start with 1 drive (single profile, no redundancy), add a second to convert to RAID1. This is a feature, not a tradeoff — data is available immediately, protection comes when the second drive arrives.

Replacement strategy

Device replacement always uses btrfs replace start, including when the source device is missing. btrfs replace start <devid> supports replacing by devid when the source is unavailable, rebuilding from RAID1 mirrors. This is preferred over the alternative btrfs device add + btrfs balance + btrfs device remove approach because:

  1. No degraded balance: btrfs docs explicitly warn against balancing a degraded filesystem to lower redundancy. btrfs replace avoids this entirely.
  2. Devid preservation: the new device inherits the old devid, keeping the pool topology stable.
  3. Single operation: one btrfs replace start call vs. three separate commands with partial-failure risk.

braid remove-missing is retained for cleanup only (forgetting stale device entries), not for replacement. When braid blocks a live replacement because the pool has missing devices, the intended next step is repairing the missing device via braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...> (the missing devid auto-resolves from --old), not forgetting it.

See

  • cli/src/cmd.rsbase_mount_options() and the btrfs mount invocation
  • tests/storage/btrfs-heal.nix — validates auto-healing
  • tests/storage/btrfs-grow1.nix, tests/storage/btrfs-shrink.nix — validates dynamic pooling

Superseded by 017-runtime-disk-membership.md.

Decision: Config-First Workflow

Principle: CLI-owned membership (successor)

Context

NixOS is declarative — nixos-rebuild switch should describe the system’s desired state. But cryptsetup luksFormat and btrfs device add are destructive one-shot operations that cannot be made idempotent. Re-running them would destroy data.

Options considered

  1. Fully declarative — module handles formatting in an activation script. Simple but catastrophic if re-run.
  2. Fully imperative — script manages everything, module reads live state. Works but creates config drift (disk formatted but not in NixOS config; pool unlock breaks).
  3. Config-first hybrid — declare disk in NixOS config (source of truth), rebuild (creates LUKS entries that fail gracefully), then run imperative script to format. Script refuses undeclared disks.

Decision

Option 3. The NixOS config is the source of truth. The script is a one-shot executor that reads from it.

Workflow

  1. Add disk to braid.disks
  2. nixos-rebuild switch — module exports /etc/braid/config.json, creates LUKS entries (which fail gracefully since disk isn’t formatted yet)
  3. sudo braid init-disk /dev/disk/by-id/... — reads config, verifies disk is declared, formats LUKS (explicit, one-shot)
  4. sudo braid apply — opens LUKS, adds to btrfs pool, balances to RAID1 if applicable
  5. Next reboot auto-unlocks

Config export

The module writes /etc/braid/config.json via environment.etc. This is the single Nix→runtime bridge. All CLI tools read it by default. The file is built at nixos-rebuild time and is read-only at runtime.

Config drift prevention

The script refuses to format disks not listed in braid.disks. Error message tells the user exactly what to add and which commands to run. This ensures every formatted disk has a corresponding LUKS entry for pool unlock.

Symmetric guards

Config-first applies to all pool operations, not just add. The guard works in both directions:

  • braid init-disk refuses disks not in braid.disks
  • braid apply removes disks from pool when they are no longer in braid.disks

Remove workflow: remove disk from braid.disksnixos-rebuild switchsudo braid apply. See 007-disk-pool-management.md for full spec.

Constraint

Two-step process (rebuild + run script) instead of a single rebuild. This is the minimum viable approach given that LUKS formatting is destructive.

Revisit trigger

If NixOS ever gets a formatDevice option type that can safely express one-shot destructive operations, this deviation can be revisited.

See

  • modules/braid/options.nixbraid.disks option definition
  • modules/braid/storage.nix — config export and LUKS entry generation
  • cli/src/ — Rust CLI (init-disk, plan, apply, status)
  • archive/design-docs/1-nixos-best-practices.md — original best practices analysis (preserved in git history; last present at commit 9df91f9)

Decision: Resilient by Default

Principle: Resilient by default

Context

The OS lives on an internal SSD. Data drives are separate. Nothing about the data drives — bad config, dead drive, unplugged cable — should prevent the system from booting. The data pool is an external resource, like a network mount. The module tries to bring it up, but if it fails, the box is still a working Linux machine you can SSH into and fix.

Options considered

  1. Hard dependencies — LUKS required, mount required. Any failure blocks boot. Simple but means a dead drive = unreachable NAS.
  2. Degraded toggle — add an option like braid.allowDegraded = true. Default to hard failure, opt in to resilience. Adds complexity and a wrong default.
  3. Resilient by defaultnofail, wants everywhere. Zero cost when healthy, graceful in every failure case. No toggle. Degraded mounts require explicit opt-in (--allow-degraded or autoUnlock.allowDegraded) to prevent silent zero-redundancy operation.

Decision

Option 3. Resilience is the default, not an option.

Implementation

LUKS unlock is strictly stage-2 — braid-unlock or braid-auto-unlock opens LUKS and mounts the pool. The module does not generate boot.initrd.luks.devices, data-pool fileSystems entries, or LUKS device declarations. The pool is brought online entirely by the CLI at runtime.

Resilience mechanisms:

  • No boot-blocking mount units: The module generates no data-pool fileSystems or LUKS entries. The CLI (braid unlock) opens LUKS and mounts btrfs directly with a plain mount call, so nothing referencing data drives can block boot. Mounting outside systemd also sidesteps the SYSTEMD_READY=0 udev quirk (systemd/systemd#36886): a missing btrfs member can mark surviving devices not-ready and stall a systemd-initiated mount — the exact failure resilience-by-default exists to prevent. Related coverage: tests/repro/udev-missing-disk-{io,idle}.py exercise udev events when a member disappears from an already-mounted pool, characterizing disappearance signals rather than the SYSTEMD_READY=0 mount-gating path. (The one build-time fileSystems entry is the optional autoUnlock USB-key mount at /run/braid-key/mnt, marked noauto/nofail so it never blocks boot and references the key device, not the pool.)
  • Degraded mount: Requires explicit --allow-degraded (or autoUnlock.allowDegraded for unattended use) — braid refuses to silently mount with zero redundancy.

Three-tier failure model

ScenarioWhat happensUser sees
All drives healthyNormal bootEverything works
One drive deadbraid unlock refuses by default; user must pass --allow-degraded or configure autoUnlock.allowDegradedPool stays locked until explicit opt-in
All drives dead / no pool.jsonbraid unlock fails (no devices to probe)System boots, SSH works, no /mnt/storage

Identity enforcement

braid unlock uses authoritative pool membership from pool.json and probes only those configured members. --allow-degraded only bypasses degraded-mount refusal; it does not change which disks are considered pool members.

Constraint

This is not configurable. There is no braid.resilient option. Every braid deployment gets resilient boot.

See

  • modules/braid/storage.nixbraid-online.service, braid-pool.target
  • tests/module/ — module tests validate boot with all drives healthy
  • archive/plans/test-boot-degraded.md — original plan and research (preserved in git history; last present at commit 9df91f9)

Decision: Single Passphrase

Principle: Single passphrase

Context

braid unlock and braid add prompt for a passphrase that unlocks all LUKS devices. If each drive had a different passphrase, the user would need to type N passphrases on every unlock. The UX must be: one passphrase, all drives unlock.

Options considered

  1. Shared keyfile on boot disk — store a keyfile on the SSD, encrypt the SSD with a passphrase. Unlocking the SSD exposes the keyfile, which unlocks data drives. More complex boot chain, keyfile is at-rest on disk.
  2. Same passphrase, no enforcement — tell users to use the same passphrase. They’ll forget or mistype. Boot breaks silently.
  3. Same passphrase, enforced at format timebraid add verifies the passphrase matches relevant existing pool members before formatting. Catches mismatches immediately.

Decision

Option 3. Enforcement at format time.

How it works

  • First disk: prompt for passphrase twice (confirm match). Standard new-passphrase flow.
  • Subsequent disks: prompt once, then verify every reachable existing LUKS device that will remain in or enter post-operation pool membership via cryptsetup luksOpen --test-passphrase. Fresh-format disks are excluded because they have no existing slot 0. The live-replace source is excluded when other retained members exist, so a divergent slot 0 on the disk being replaced does not block its own replacement. If verification fails, refuse to proceed with a clear error.

Finding a verification target

The CLI reads which devices are in the btrfs pool and verifies the supplied passphrase against each relevant underlying LUKS device before opening or mutating disks. The same result-membership rule applies to keyfile credentials used by mount, unlock, and recover: every planned LUKS target is verified before any mapper is opened with that credential.

For add, this widened preflight changes one mixed-failure precedence case. If a non-first pool member has a divergent slot 0 and a closed PresentLuks { mapper_open: false } candidate would later surface a foreign-FSID or no-btrfs identity error during execute Pass 1, the pool-member credential error now wins. The old identity-first ordering in that shape came from the former first-disk-only verify and was not a documented invariant. A divergent slot 0 is a pool-wide integrity issue that affects future operations; surfacing it before the candidate’s one-off selection error is intentional.

Identity errors found during planning are unchanged. The add work-plan builder still validates every PresentLuks candidate’s braid label and classifies already-open candidates before AddPlan::execute runs, before any passphrase read, and before the widened verify.

Scope

This decision governs the shared passphrase: one passphrase, enrolled in LUKS key slot 0 on every pool disk, enforced at format time. Additional unlock mechanisms (USB keyfiles, TPM, etc.) are orthogonal — they use separate LUKS key slots and do not weaken or replace the passphrase requirement.

See

  • cli/src/ — passphrase prompt and verification logic in the Rust CLI
  • design-docs/1-braid-add-disk.md — original script design (preserved in git history; last present at commit 4112e57)

Decision: Sane Defaults

Context

Braid should protect the user’s data without requiring them to read through NixOS options to find features worth enabling. If a setting is something every NAS should have, braid should turn it on automatically.

The guiding question: would a knowledgeable admin always enable this? If yes, braid enables it by default.

Decision

Braid sets opinionated defaults two ways: lib.mkDefault for simple pass-through defaults on stable NixOS options, and a braid.* wrapper option when the feature is inside braid’s product boundary and benefits from lifecycle control, discoverability, or a unified config surface — even if the mapping is 1:1. The two cases below say which applies.

When to use mkDefault (don’t wrap)

Use lib.mkDefault to set an underlying NixOS option directly when:

  • The NixOS option is stable and well-known — wrapping it adds no clarity.
  • The meaning doesn’t change if braid’s internals change.
  • The mapping is 1:1 and braid doesn’t need lifecycle control.

The user overrides by setting the NixOS option in their own config. mkDefault gives way automatically.

When to wrap in a braid option

Create a braid.* option when:

  • One braid option maps to many underlying options — e.g., braid.autoUnlock sets a fileSystems mount entry for the USB key, a braid-auto-unlock.service, systemd.tmpfiles rules, and assertions.
  • The underlying tech could change — the abstraction survives an implementation swap.
  • The raw option requires braid-specific context — e.g., the pool membership encodes LUKS + mapper naming conventions. Exposing the raw options would require the user to understand braid’s internals.
  • The mapping is non-obvious or must stay in sync — e.g., if braid supported multiple pools, scrub fileSystems would need to track all mount points automatically.
  • Braid needs lifecycle control — the feature must be tied to the pool’s online state, not the host system’s always-on timers. Example: braid.autoScrub wraps a 1:1 mapping but needs the timer bound to braid-online.service so Persistent=true catches up missed scrubs after unlock.

Defaults applied

SettingValueRationale
braid.autoScrub.enabletrueScrub detects bit rot before it compounds. Every NAS should do this. Wrapped in a braid option for lifecycle binding to braid-online.service.
braid.autoScrub.interval"monthly"Btrfs community consensus. Weekly is aggressive for spinning disks; quarterly risks undetected corruption on a small RAID1. TrueNAS defaults to weekly (ZFS); Synology doesn’t enable it by default. Monthly is the sweet spot.
braid.poolAccessGroup"storage"Mount root set to root:storage 2770. Users in the group can read/write the mount root. Setgid ensures new entries inherit the group. Same pattern as TrueNAS/OMV. Does not override per-file umask.

Alternatives considered

Wrap scrub in braid.autoScrub

Accepted (reversed). Initially rejected because wrapping a 1:1 mapping seemed like unnecessary indirection. Reversed because braid needs lifecycle control over the scrub timer: the timer must be bound to braid-online.service so it only runs while the pool is online, and Persistent=true can catch up missed scrubs on unlock. The nixpkgs services.btrfs.autoScrub timer fires on calendar boundaries regardless of pool state, causing silent failures when the pool is locked.

Don’t enable scrub by default

Rejected. This is what Synology and Unraid do — scrub is opt-in. Users who don’t know about scrub never enable it. Braid’s philosophy is that data integrity features should be on by default.

Weekly scrub (TrueNAS default)

Rejected. TrueNAS runs ZFS on always-on servers. Braid targets home NAS with spinning disks where weekly scrubs add unnecessary wear and noise. Monthly catches bit rot well before it can compound across a 2-3 drive RAID1.

See

  • modules/braid/options.nix — declares the option defaults (braid.autoScrub, braid.poolAccessGroup)
  • modules/braid/storage.nix — realizes braid.autoScrub into the scrub lifecycle units (braid-scrub timer/service and braid-scrub-resume-trigger), all bound to braid-online.service
  • cli/src/online_state.rsmark_online() applies the mount-root permissions from braid.poolAccessGroup (root:<group> 2770)
  • Resilient by default — related philosophy: protect by default, no toggles

Decision: NixOS-native

Context

Braid is a NixOS module. It only targets NixOS — no portability goal for other distros, container runtimes, or generic Linux. Every design decision can assume the full NixOS ecosystem is available.

Decision

Braid follows NixOS module conventions — same option types, module patterns, and idioms as nixpkgs. When in doubt, nixpkgs is the tiebreaker.

What this means in practice

  • Options use standard lib.mkOption types (lib.types.listOf, lib.types.attrsOf, lib.types.submodule, etc.) — not custom validation or string parsing.
  • Activation uses systemd units, not custom init scripts or cron jobs.
  • Dependencies use systemd ordering (after, wants, requires), not polling or sleep loops.
  • Defaults use lib.mkDefault / lib.mkForce priority, not conditional logic.
  • Config generation uses NixOS module merge semantics — not imperative file templating.
  • No portability shims. Use NixOS mechanisms (boot.initrd.network, environment.etc, systemd.services) directly.

Tiebreaking

When two approaches both work:

  1. Check how nixpkgs modules handle the same problem.
  2. If no precedent, prefer whichever composes better with the NixOS module system.

Alternatives considered

Support other distros via abstraction layers

Rejected. Braid’s value comes from deep NixOS integration — declarative disk config, reproducible builds, VM-tested infrastructure. The target user already runs NixOS.

Use generic Linux tooling where possible

Rejected. “Generic” means reimplementing what NixOS already provides (shell scripts instead of systemd units, config files instead of NixOS options) — more maintenance, no atomicity or rollback.

See

Superseded by 012-intent-cli.md and 017-runtime-disk-membership.md.

Decision: Disk Pool Management

Principle: CLI-owned membership

Context

braid-add-disk exists and is tested. The pool still needs graceful disk removal, status reporting, and a clear replace workflow. All operations must follow the same config-first pattern: edit braid.disksnixos-rebuild switch → run CLI tool.

Principle: config-first applies symmetrically

Config-first is not just for adding disks. Every pool mutation follows the same workflow:

  • Add: declare disk in braid.disks → rebuild → braid-add-disk
  • Remove: remove disk from braid.disks → rebuild → braid-remove-disk
  • Replace: remove dead disk + add replacement in braid.disks → rebuild → braid-add-disk

Symmetric guards enforce this:

  • braid-add-disk refuses disks not in config
  • braid-remove-disk refuses disks still in config

braid-remove-disk spec

Three-tier logic

  1. Target mapper exists and is open, verified to map to the requested by-id disk → graceful btrfs device remove /dev/mapper/xxx (migrates data off the device)
  2. Target is absent/unopenable and pool shows a missing device → btrfs device remove missing
  3. Otherwise → fail with clear diagnostic

Graceful remove is preferred when possible. It avoids relying on RAID1 reconstruction and eliminates ambiguity if more than one device is missing.

LUKS cleanup

After btrfs remove, cryptsetup close the mapper. Best-effort:

  • Success → print “disk fully released” (safe to physically pull)
  • Failure (busy) → print actionable next steps (lsof/fuser + retry), exit non-zero

No passphrase required

Remove does not need a passphrase. The disk is already unlocked or already gone. Root access + config guard + typed confirmation is sufficient.

Confirmation

Normal remove (pool stays RAID1 with 2+ disks):

Type 'remove this disk' to confirm:

Removing would drop below 2 disks (losing redundancy):

WARNING: This leaves 1 disk with no RAID1 redundancy.
A single disk failure will cause data loss.

Type 'remove this disk without redundancy' to confirm:

Warn but allow dropping to 1 disk — consistent with the single-disk start story.

Reboot-in-between safety

If the user reboots between nixos-rebuild switch (which removes the LUKS entry) and running braid-remove-disk, the disk won’t auto-unlock. This is safe: principle #1 (resilient boot) ensures the system boots and is reachable via SSH. The pool requires explicit --allow-degraded (or autoUnlock.allowDegraded) to mount degraded. The CLI handles both paths (tier 1 if disk is still somehow open, tier 2 if it’s absent).

braid-status spec

Default output

Pool health summary: drive count, RAID profile, total/used/free capacity, degraded/missing state, last scrub result. Per-disk detail: model, serial, mapper name, btrfs devid, read/write/corruption error counters, LUKS UUID, present/missing state.

--json

Machine-readable output for monitoring and automation.

Replace workflow

Replace uses braid-add-disk, which already auto-evicts missing devices during rebalance.

Workflow:

  1. Remove dead disk from braid.disks, add replacement
  2. nixos-rebuild switch
  3. sudo braid-add-disk /dev/disk/by-id/<new-disk>

Auto-evict is specifically for missing/dead devices. Planned removal of a healthy disk uses braid-remove-disk.

Future vision

Document only — do not build yet.

Unified CLI: braid plan (dry-run diff of config vs live state), braid apply (execute with checkpoints and resumability), braid status, braid replace-disk <old> <new>.

Phased roadmap:

  1. Ship braid-remove-disk and braid-status (solid primitives)
  2. Read-only planner (braid plan)
  3. Executor with checkpoints (braid apply)
  4. First-class braid replace-disk
  5. braid-status --json

Nix config remains source of truth throughout. The workflow evolves from edit → rebuild → script to edit → rebuild → plan → apply, but the principle is unchanged.

CLI shape

Separate scripts (braid-add-disk, braid-remove-disk, braid-status) — not a unified CLI yet. The unified braid command is future work that depends on proven primitives.

See

  • modules/braid/options.nixbraid.disks option definition
  • modules/braid/storage.nix — config export and LUKS entry generation
  • docs/design/decisions/002-config-first-workflow.md — original config-first decision

Superseded by 012-intent-cli.md.

Decision: Unified CLI with Plan/Apply

Principle: CLI-owned membership (successor)

Context

Braid had three standalone scripts (braid-add-disk, braid-remove-disk, braid-status). Each handled one operation with its own validation, pool probing, and confirmation flow. The config-first workflow (edit config → rebuild → run script) was sound, but operators had to choose the right script and remember its flags. All three are now replaced by the unified Rust CLI.

A unified braid command with plan (dry-run diff) and apply (execute with checkpoints) replaces the multi-script mental model with one flow: edit config → rebuild → plan → apply.

Options considered

  1. Keep separate scripts — add braid-plan as a fourth script. Simple but doesn’t unify the execute path or add checkpoint/resume.
  2. Go binary — full rewrite in Go. Better for complex state machines, but high migration risk and slower delivery for equivalent behavior.
  3. Bash+jq unified script — single braid dispatcher with subcommands. Reuses existing tested patterns. JSON plan/checkpoint formats work with jq.

Decision

Option 3. Initial implementation was bash+jq. Now replaced by Rust CLI (cli/).

Architecture

Rust CLI (cli/src/) with subcommand dispatcher:

  • braid init-disk <by-id> [--force] [--config <path>] — destructive one-shot: LUKS format a declared disk. Requires explicit operator intent. Never called from apply.
  • braid plan [--json] [--allow-remove-missing] [--allow-remove-ambiguous] [--config <path>] — read-only diff: desired state (config) vs live state (LUKS/btrfs/mounts). Outputs action list with status (applicable/blocked), warnings, and blocked reasons.
  • braid apply [--resume] [--allow-remove-missing] [--allow-remove-ambiguous] [--config <path>] — executes plan with checkpoint persistence. --resume continues from /var/lib/braid/apply-state.json. Never performs luksFormat.
  • braid status [--json] [--config <path>] — pool health summary with per-disk detail (replaces braid-status).
  • braid doctor [--json] [--config <path>] — run diagnostic checks against config and pool state (config file, schema, permissions, declared disks, data/metadata profile consistency). Reports ok/warn/fail per check.

Packaged via Crane + makeWrapper in flake.nix.

Hard boundary

cryptsetup luksFormat is forbidden in the plan and apply code paths. Only init-disk may invoke luksFormat. See 009-safe-by-construction-reconciliation.md.

Plan status model

Plan JSON includes:

  • status: applicable (can be executed) or blocked (requires operator action first)
  • blocked_reasons[]: list of reasons the plan cannot proceed (e.g., INIT_REQUIRED, IDENTITY_AMBIGUOUS_ABSENT_DISK)
  • warnings[]: non-blocking issues (e.g., DISK_ABSENT_SKIPPED, INIT_REQUIRED, POOL_DEGRADED)
  • confirmations[]: actions requiring explicit operator confirmation (e.g., redundancy loss). When multiple confirmations are required, provide all phrases semicolon-separated in BRAID_CONFIRM (e.g., BRAID_CONFIRM='phrase one;phrase two'). Whitespace around semicolons is trimmed.

Plan/apply state machine

  1. braid plan produces a JSON plan (action list with types, targets, preconditions)
  2. braid apply runs the planner internally, writes checkpoint, executes actions in order
  3. Each action updates the checkpoint atomically (write tmp + mv)
  4. On success: checkpoint moves to /var/lib/braid/history/<plan_id>.json, active file removed
  5. On failure: checkpoint stays for --resume
  6. --resume verifies config hash matches before continuing
  7. On resume, absent action targets fail with RESUME_TARGET_MISSING (strict in-flight integrity)

Action types

  • OPEN_LUKS — open existing LUKS device (non-destructive)
  • ADD_DISK_BTRFS_ADD — add mapper to btrfs pool
  • BALANCE_TO_RAID1 — convert pool to RAID1 profile
  • REMOVE_DISK_GRACEFUL — btrfs device remove (data migrates)
  • REMOVE_DISK_MISSING_EXPLICIT — btrfs device remove missing (requires --allow-remove-missing + BRAID_CONFIRM)
  • CLOSE_LUKS_MAPPER — cryptsetup close
  • VERIFY_POOL_HEALTH — confirm pool state matches expectations
  • VERIFY_EXPECTED_DISK_SET — confirm pool members match config

Checkpoint schema

Active: /var/lib/braid/apply-state.json History: /var/lib/braid/history/<plan_id>.json (last 20 retained)

Backward compatibility

braid-add-disk is now an error stub that directs operators to braid init-disk + braid apply. braid-status is deleted (replaced by braid status). braid-remove-disk remains as a standalone legacy script (not yet ported to the Rust CLI).

Constraint

Two commands (plan then apply) instead of one. This is intentional — deterministic dry-run before mutation prevents accidents.

See

  • docs/design/decisions/002-config-first-workflow.md — config-first principle this builds on
  • docs/design/decisions/009-safe-by-construction-reconciliation.md — destructive boundary principle
  • docs/design/decisions/007-disk-pool-management.md — existing pool management spec
  • cli/src/ — Rust CLI implementation

Superseded by 012-intent-cli.md.

Decision: Safe-by-Construction Reconciliation

Principle: Safe-by-construction operations

Context

braid apply originally mixed two fundamentally different operation classes:

  1. One-time destructive initializationcryptsetup luksFormat destroys all data on the target device. It is not idempotent. Running it twice destroys a working LUKS volume.
  2. Repeatable reconciliationcryptsetup luksOpen, btrfs device add/remove, balance, verify. These are safe to run repeatedly.

The structural hazard: state ambiguity (temporarily absent disk vs truly new disk) can route execution toward formatting. A disk that was unplugged and replugged could be misidentified as “new” and reformatted, destroying data.

Options considered

  1. Registry-based — track disk identity in a persistent registry to distinguish “new” from “returning”. Adds hidden state, drift risk, and recovery complexity.
  2. Config flags — add lifecycle state (new/existing/replace) to NixOS config. Violates declarative end-state principle — config becomes imperative.
  3. Structural separation — move destructive operations to a separate command that requires explicit operator intent. apply physically cannot format.

Decision

Option 3. Hard boundary between destructive initialization and safe reconciliation.

Architecture

  • braid init-disk <by-id> — the only command that may call cryptsetup luksFormat. Requires the disk to be declared in config, not already LUKS-formatted (unless --force), and not in an active pool. Enforces single-passphrase invariant.
  • braid plan / braid apply — reconciliation only. Emits OPEN_LUKS (non-destructive open), never luksFormat. Non-LUKS disks produce an INIT_REQUIRED warning telling the operator to run init-disk first.

Hard boundary enforcement

cryptsetup luksFormat is forbidden in the plan/apply code path. This is verified by:

  1. Code inspection — no luksFormat call exists in compute_plan(), executor dispatch, or any function reachable from cmd_apply().
  2. Test assertion — braid-apply.py includes an explicit test that apply never contains luksFormat.
  3. The ADD_DISK_LUKS_FORMAT_OPEN action type has been removed from the planner and executor entirely.

Missing-disk policy

Absent configured disks are skipped with a DISK_ABSENT_SKIPPED warning. The plan remains applicable — other safe operations proceed. This prevents a temporarily disconnected disk from blocking all reconciliation.

Missing pool devices (devices in btrfs but not in config) require explicit operator intent to evict: --allow-remove-missing flag plus BRAID_CONFIRM='remove missing device from pool' environment variable. This prevents accidental eviction of temporarily absent disks.

Device identity is established by LUKS UUID, not by path or mapper name. When a config disk is absent, its UUID is unknowable, creating identity ambiguity for removal decisions. If the planner wants to remove a pool device but cannot verify it doesn’t match an absent config disk, the plan is blocked with IDENTITY_AMBIGUOUS_ABSENT_DISK. The operator can override with --allow-remove-ambiguous plus BRAID_CONFIRM='remove despite ambiguous identity'.

Resume strictness

Fresh apply is tolerant of absent disks (skip + warn). But checkpointed in-flight actions are strict: if a pending action’s target becomes absent during --resume, the apply fails with RESUME_TARGET_MISSING. The checkpoint is preserved for retry after the target is restored.

Constraint

Two commands (init-disk + apply) instead of one. Operators must explicitly initialize each new disk before reconciliation can include it. This is the minimum viable approach given that LUKS formatting is destructive and non-idempotent.

Revisit trigger

If NixOS or LUKS ever provides an idempotent “ensure formatted” primitive that is safe to run on an already-formatted device, the separation can be revisited.

See

Decision: Toolchain pinning

Context

Braid’s parser-critical runtime tools (btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, ethtool) are parsed by the Rust CLI. Output formats change between tool versions – a flake update to nixpkgs-unstable could silently break parsers. Generic helpers (coreutils, systemd) are used for basic system operations and are outside braid’s parser contract. Browse has one tolerant UI-only systemd exception: it parses systemctl list-units --output=json for a picker and falls back to raw output on parse failure.

Decision

Pin flake.nix to a specific NixOS stable release (nixos-26.05). Pin only parser-critical tools — those whose output braid parses or whose behavior is part of braid’s correctness model. Generic helpers come from the consumer’s system package set.

How it works

  • Flake input: nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05" — braid’s own pinned channel, and the source of parser-critical tool packages unless the consumer redirects braid’s nixpkgs input (see the follows note below).
  • Module options: braid.packages.* (cryptsetup, btrfsProgs, utilLinux, nut, smartmontools, ethtool) default to braid’s nixpkgs flake input but can be overridden per-system.
  • PATH wrapping: The wrapper injects cfg.packages.* into PATH. Generic helpers (coreutils, systemd) are resolved from the consumer’s pkgs, not pinned.
  • Two wrapping sites: flake.nix wraps with pkgs.* defaults (for nix run and tests); the module wraps cfg.package with cfg.packages.* (for deployed NixOS systems where package options may be overridden).

Consumer follows decides the actual source

nixosModules.default builds the braid.packages.* defaults with import self.inputs.nixpkgs – braid’s nixpkgs flake input, instantiated cleanly (no consumer overlays). Whether the consumer sets braid.inputs.nixpkgs.follows = "nixpkgs" decides where the pinned tools actually come from.

The recommended default is no follows. With no follows, braid’s nixpkgs input stays on its pinned nixos-26.05, so the pinned tools resolve from braid’s release channel and braid-cli-unwrapped matches the exact binary the release cache publishes – a cache hit instead of a from-source rebuild on the NAS. ADR 029 is the authoritative home for that cache-path-identity rationale; the short version is that follows rebuilds braid against the consumer’s nixpkgs, changing the store path and forcing a recompile.

follows = "nixpkgs" is a valid advanced opt-out (smaller closure via nixpkgs dedup), but it redirects braid’s nixpkgs input to the consumer’s nixpkgs, so the pinned tools then resolve from the consumer’s nixpkgs. The pin therefore guarantees stable parser output only while the consumer’s nixpkgs stays on the same NixOS stable release braid targets (currently nixos-26.05). Within one stable release tool output formats change only for security fixes, so a consumer aligned on braid’s release is safe; a consumer who bumps nixpkgs ahead of braid moves the storage toolchain with it and re-introduces the parser-drift risk this decision otherwise prevents. If you do opt into follows, mitigate by keeping nixpkgs aligned with braid’s release or pinning braid.packages.*.

Operational escape hatch

Parser-critical tools are pinned by default to the flake’s nixpkgs release, but braid.packages.* overrides are intentional – operators may need a newer upstream version for urgent bugfixes or security patches before braid’s next nixpkgs bump. The override takes precedence. Operator-set braid.packages.* overrides sit outside braid’s committed parser contract: the standard fixture-capture and golden-test recipes build fixed flake checks against the flake’s pkgs, so they do not validate an arbitrary override. Treating an override as supported requires a maintainer to reproduce the fixture-refresh workflow under a temporary local input swap (e.g. --override-input nixpkgs on the capture/test commands, or a local flake edit) at the override’s package version, then re-run just test-rust against the resulting fixtures. Operators who skip this step are running unverified parser inputs.

Classification guideline

Pin when: braid parses the tool’s output, or the tool’s behavior is part of braid’s correctness/safety model.

Use system pkgs when: the tool is a generic helper, braid doesn’t parse its output as a correctness contract, and version drift is unlikely to affect correctness. The Browse Systemd picker is a UI-only exception because it parses systemctl list-units --output=json tolerantly and disables drill-in on parse failure.

New runtime dependencies must be classified into one of these two groups when added.

ToolPinned by default?Overrideable?Reason
btrfs-progsYesYes (braid.packages.btrfsProgs)Output parsed by nom combinators and serde JSON
cryptsetupYesYes (braid.packages.cryptsetup)Output parsed by nom combinators
util-linux (lsblk)YesYes (braid.packages.utilLinux)lsblk JSON output parsed by serde
NUT (upsc)YesYes (braid.packages.nut)upsc key: value output parsed by parse_upsc for preflight safety and operator visibility
smartmontoolsYesYes (braid.packages.smartmontools)smartctl --json output parsed by parse_smartctl
ethtoolYesYes (braid.packages.ethtool)Wake-on: line parsed by the doctor wake_on_lan check
coreutilsNo — system pkgsNo optionchown/chmod/realpath/stat — output not parsed
systemdNo — system pkgsNo optionsystemctl/ask-password commodity behavior; Browse’s list-units JSON picker is tolerant UI-only, not parser-critical

Upgrading tools

A nixpkgs bump can move parser-critical tools to new output formats, so an upgrade must refresh fixtures and re-run every parser-validation lane – not just confirm tool provenance. These steps mirror the canonical sequence in dev/overview.md (“Refresh fixtures and run tests”); keep the two in sync.

  1. Bump the nixpkgs input to the next stable release and run nix flake update nixpkgs.
  2. Refresh fixtures: just capture-all-fixtures writes golden files under cli/tests/fixtures/nixos-<release>/ (with upsc/ holding the capture-ups-fixtures outputs). just capture-all-fixtures-unstable is the unstable-lane mirror.
  3. Run the parser-validation lanes, updating parsers/tests for any output that changed:
    • just test-rust – golden-fixture parser tests.
    • just test-parsers – live-tool parser canary.
    • just test-vm – VM suite. Its tool-versions check verifies provenance: each pinned tool resolves to a /nix/store/ path on the VM’s PATH and its self-reported version matches pkgs.<tool>.version from the same evaluation. Provenance only – tool-versions does not detect that nixpkgs moved a tool to a new version (both sides advance together), so the fixture and parser tests above are the actual drift gate. Run it alone with just test-vm tool-versions for a quick provenance-only check.

NUT specifically: parse_upsc depends on the key: value shape emitted by pkgs.nut’s upsc client (see reference/nut/clients/upsc.c). A nixpkgs bump that touches networkupstools triggers the same fixture-refresh obligation as the other pinned tools – run just capture-ups-fixtures and just test-rust before merging. The braid-status-ups check under just test-parsers is the live-tool mirror of the golden fixtures.

ethtool specifically: wake_on_lan depends on the Supports Wake-on: and Wake-on: lines emitted by pkgs.ethtool. VM virtio NICs do not provide useful Wake-on-LAN state, so there is no live fixture-capture lane; parser coverage is hand-authored in Rust unit tests, and wrapper provenance is covered by the tool-versions and braid-auto-suspend VM tests.

Alternatives considered

BRAID_*_BIN environment variables

Rejected. Adds a second resolution mechanism alongside PATH. Every callsite would need to check the env var, falling back to PATH. More complexity, same result — Nix already controls PATH.

Absolute paths in Rust (no PATH at all)

Rejected. Would require threading Nix store paths into the Rust binary at build time (via build.rs or env vars). Fragile and non-standard — NixOS convention is PATH wrapping via makeWrapper.

Stay on nixpkgs-unstable

Rejected. Unstable channel updates tool versions without notice. A routine nix flake update could change btrfs-progs output format and break parsers silently. Stable releases change only for security fixes.

Pin all runtime tools (blanket pinning)

Previously active, now superseded. Blanket pinning created unnecessary closure duplication for generic helpers (jq, coreutils) that braid does not parse. The braid.packages.coreutils option was also inconsistently wired — storage.nix used pkgs.coreutils directly, bypassing the option. Selective pinning is simpler and honest about what braid actually depends on.

See

Superseded by 012-intent-cli.md.

Two-Phase Apply (LUKS Pre-Phase)

Context

After a reboot, all LUKS mappers are closed and the btrfs pool is unmounted. The planner runs after probe, which sees no open mappers and no mounted pool. This causes two problems:

  1. Misleading plan display: braid plan shows mkfs.btrfs -f -d single -m dup (may run) for disks that are actually returning pool members. The execute-time superblock check prevents data loss, but the plan output is alarming for routine re-mounts.

  2. Mount failure after reboot: Per-device btrfs device scan <device> doesn’t reliably assemble multi-device pools. Even after opening all LUKS mappers and scanning each device individually, mount can fail with “missing members” because the kernel’s btrfs subsystem hasn’t been told about all members atomically. btrfs device scan (no arguments) scans all block devices and reliably assembles multi-device pools.

Decision

Move LUKS opening into a pre-phase that runs before plan generation. The sequence changes from:

old:  checkpoint check → probe → plan → checkpoint → execute
new:  checkpoint check → luks_prephase → probe → plan → checkpoint → execute

The luks_prephase function:

  1. Opens closed LUKS mappers — iterates config disks, skips absent and already-open, reads passphrase from --passphrase-stdin or TTY lazily (only when the first closed mapper is encountered).
  2. Scans all — runs btrfs device scan (no arguments) to register all open btrfs members with the kernel.
  3. Mounts pool — if the mount point is not already mounted, finds the first open mapper and attempts mount. Missing-members errors are tolerated (not all disks may be available). Hard errors propagate.

After the pre-phase, probe sees accurate state: all available mappers are open, the pool is mounted (if all members are present) or truly empty (bootstrap). The plan has no OPEN_LUKS actions for available disks, and is_bootstrap is accurate.

Resume with closed LUKS

If braid apply --resume detects closed LUKS mappers (device exists but /dev/mapper/<name> does not), the checkpoint is invalidated and fresh_apply is called instead. This handles the case where a checkpoint was created pre-reboot and the system has since rebooted. The pre-phase in fresh_apply opens LUKS and re-probes, generating a correct plan.

This avoids the complexity of reconciling a stale checkpoint against post-reboot state. The ActionState state machine doesn’t allow Pending → Completed, so marking pre-reboot work as completed would require weakening type safety.

BtrfsDeviceScanAll

A new CmdRequest::BtrfsDeviceScanAll variant runs btrfs device scan with no arguments, scanning all block devices. This replaces per-device scans in the pre-phase and in the execute_btrfs_add bootstrap-with-existing-btrfs path.

Pre-Phase Side-Effect Policy

After the pre-phase, LUKS is open and the pool is mounted even if the planner subsequently returns Blocked. This is a change from the old invariant:

  • Old: Blocked = no mutations.
  • New: Blocked = no planned mutations, but LUKS/mount happened as a pre-condition for accurate planning.

This is correct operationally — the pool was supposed to be online — but operators should be aware that braid apply with a blocked plan still opens LUKS and mounts the pool. The passphrase is consumed before the plan is generated.

Alternatives Considered

Reconcile stale checkpoint on resume

Walk the old checkpoint, detect which actions completed pre-reboot (by checking mapper/mount state), and mark them completed. Rejected because:

  • ActionState::transition_to doesn’t allow Pending → Completed
  • Complex reconciliation logic with risk of incorrect state detection
  • Simpler to invalidate and re-plan since pre-phase makes re-planning cheap

Keep LUKS opening in the execute phase

Could add btrfs device scan (no args) to the execute phase. Rejected because the planner would still see inaccurate state, generating misleading plans with mkfs.btrfs (may run) for returning pool members.

Dry-run LUKS open (check without opening)

Could probe LUKS UUIDs without opening to give the planner hints. Rejected as more complex than just opening — LUKS needs to be open anyway, so doing it early is simpler.

Active – Supersedes 008-unified-cli.md and 011-two-phase-apply.md.

Intent CLI

Context

Braid’s plan/apply reconciliation engine was over-engineered for NAS drives, which have ~4 events in their lifetime (create pool, add disk, add another, replace a dead one). The generic reconciler created problems:

  • Risk flattening: routine reboot and adding a disk produced the same output format (a “plan” with “actions”)
  • Combinatorial complexity: --allow-remove-missing, --allow-remove-ambiguous, BRAID_CONFIRM='phrase1;phrase2'
  • Ceremony for routine operations: braid apply after every reboot

Decision

Replace plan/apply with five intent commands:

CommandPurposeRisk
braid add <name=by_id>...Format + join pool, or recover identity-verified LUKS deviceDestructive (new disk), safe (returning braid disk with matching FSID), or refused (non-braid LUKS, foreign pool, no pool to verify)
braid remove <name>Migrate data off present disk, detach from poolLong-running
braid remove-missing --missing-id <devid>Clean up a stale missing-device entry; restores RAID1 profiles if this clears the last missing deviceLong-running
braid replace --old <name> --new <name=by_id>Replace a disk (live or dead) using btrfs replace start; restores RAID1 profiles for missing-path when clearing the last missing deviceIn-place swap (preserves devid)
braid statusDisplay pool health and disk infoRead-only

Disk keys

Disk membership is CLI-owned runtime state in /var/lib/braid/pool.json (see 017-runtime-disk-membership.md). pool.json is keyed by LUKS UUID; the disk name is stored as presentation metadata. Disks are added with name=by_id syntax:

braid add toshiba=/dev/disk/by-id/ata-Toshiba_MN07_XXXX \
          ironwolf=/dev/disk/by-id/ata-Ironwolf_ST12_YYYY

Mapper names are braid-<name> (e.g., braid-toshiba) — human-friendly, debuggable in lsblk/systemd logs, deterministic. They are runtime handles, not persistent identity.

Safety model

The old architecture used a structural code boundary — luksFormat was literally unreachable from apply. The new architecture replaces this with:

  1. Explicit operator intent: user specifies a disk key and confirms
  2. Layered identity check for existing LUKS devices: a. LUKS UUID is the persistent identity. LUKS label braid-<key> is an adoption-safety gate for returning disks; non-braid LUKS is refused outright. b. Pool must be mounted — bootstrap refuses existing LUKS (no pool to verify against). c. Opened mapper’s btrfs FSID must match the current pool — foreign-pool disks are refused. d. Braid-labeled LUKS with no btrfs superblock is refused – this state is ambiguous (clean eviction, partial init, manual wipe, stale data) and cannot be distinguished without tombstones. e. A braid-labeled LUKS disk with a btrfs superblock whose FSID matches the mounted pool may be accepted as a returned-disk add target. The add journal records the LUKS UUID before mutation. If the stale btrfs signature would block btrfs device add, braid runs only wipefs --all --types btrfs on the verified mapper and uses btrfs device add -f. f. Superblock guard is defense-in-depth on the FSID-matching path for existing-LUKS adds. The bootstrap path accepts only disks classified as fresh non-LUKS during add planning, and the LUKS open helpers verify that any pre-existing braid-<key> mapper is backed by the requested by-id disk before pool creation proceeds. mkfs.btrfs itself is invoked without -f, so its own signature check is the final fail-closed guard.
  3. Unified confirmation with device context: all mutating commands (add, remove, remove-missing, replace) show a rich device-info block (model, size, serial via lsblk) and confirm with Type 'yes' to continue:. Degraded-path warnings are informational text, not special confirmation phrases. --yes skips the prompt for scripting.
  4. Disk name immutability: mutating commands validate names against recorded disk identity and reject name rename/reassignment. Operators must use explicit replace or remove+add workflows instead of renaming.
  5. Journal-protected mutations: mutating commands write pending-op.json before the first irreversible step; it is cleared only after the full operation (including follow-up work like soft balance) succeeds. Existing-pool add, replace, and remove-missing journals are phased. Their PoolMutation phases may reconcile whether the primary btrfs membership mutation committed; their post-maintenance phases may only validate committed membership, repair pool.json, and finish owed resize/balance work. On any error exit, the journal persists to enable braid recover.

--dry-run performs side-effect-free, passphrase-free LUKS probes only – LUKS label reads, and the keyfile credential test used by braid enroll (cryptsetup open --test-passphrase --key-file, which evaluates a credential without activating the device). Checks that require a passphrase or an open mapper – e.g. full identity verification (FSID comparison) – are deferred to execution time when the mapper is closed.

The dry-run preview itself stays on stdout. Side-effect-free probes that nevertheless do bound long-running work – specifically the Argon2-bounded --test-passphrase evaluation in braid enroll --dry-run – emit canonical [wait]/[ok]/[skip] status rows to stderr per Principle 13. Announce long-running work. The previous “successful dry-run leaves stderr empty” contract is intentionally relaxed for this case: an Argon2 derivation runs whether or not the user can see it, and silent dry-runs that take seconds-to-minutes look like hangs. The structured preview output is unchanged.

Replace safety constraints

  • --old accepts both live (present in pool) and dead/missing disks.
  • Both paths use btrfs replace start — the sole replacement primitive. Live disks replace in-place; missing disks are rebuilt from RAID redundancy by devid.
  • --missing-id is only valid when --old is dead/missing. Rejected with live --old. Validated against PoolState::missing_devids (live btrfs state via probe::probe_pool).
  • The missing devid is auto-resolved from --old’s persisted pool.json devid, cross-checked against PoolState::missing_devids – independent of how many devices are missing. Because --old’s name already identifies the member, no missing-count gate is needed; --missing-id is an optional cross-check (it must equal the persisted devid, else OldDevidMismatch) and is never required.
  • Mixed state (live --old + pool has missing devices) is rejected – operator must repair the missing device first with braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>. braid remove-missing is only for intentional cleanup (forgetting stale entries without rebuilding data).
  • No replacement path uses btrfs device add. Missing-path replace may run a post-commit soft RAID1 balance only when it clears the last missing device.

ENOSPC pre-flight check

remove and remove-missing validate that surviving devices have enough unallocated space to absorb the target device’s allocations before invoking btrfs device remove. Without this, btrfs will either ENOSPC instantly or crash the filesystem to read-only mid-relocation (reproduced in tests/repro/).

The >=2-survivor remove path treats relocation-probe uncertainty as warn-and-proceed – a miss falls through to btrfs’s clean instant-ENOSPC – while remove-missing and the 2→1 remove path are fail-closed on any uncertainty, because a miss there can crash the filesystem read-only with pending-op.json already written.

remove-missing also refuses an untrusted missing-device allocation shape before btrfs device remove. Its trust check validates shape, not per-type completeness: the targeted missing devid must have exactly one usage stanza, every positive target allocation row must be one of Data/Metadata/System RAID1, and at least one positive supported row must be present. Missing supported row types are treated as zero demand because a sparse 3+ device RAID1 member may legitimately hold only a subset of Data, Metadata, and System chunks.

Single-survivor cases use a path-specific check:

  • remove (2→1): the RAID1-aware relocation check does not apply (there is only one remaining device, not two). Instead, a single- survivor capacity check derives demand from btrfs filesystem df logical usage – data + 2 * metadata + 2 * system, reflecting the post-balance single + DUP profile on one device – and compares it to the survivor’s device_size - device_slack. This check runs at plan time and is re-run as a pre-journal gate in execute (above journal::write_journal), closing the plan/execute drift window – a survivor over-committed by writes during the confirmation + inhibitor wait is caught before the irreversible -f balance and fails clean, with no pending-op.json stranded.
  • remove-missing on a 2-device RAID1 pool with 1 missing (pool.total_devices == 2 && pool.devices.len() == 1 && pool.missing_count == 1): rejected at preflight. btrfs_rm_device runs btrfs_check_raid_min_devices(num_devices - 1) and returns BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET whenever the remaining device count would drop below the RAID1 minimum of 2, so the call is guaranteed to fail at the kernel level. The supported repair paths for that case are braid replace (preferred) or braid add followed by braid remove-missing.

NixOS-native automation

  • systemd braid-unlock.service + braid-pool.target for post-boot unlock
  • braid-online.service lifecycle owner (ExecStop=braid lock, RemainAfterExit=yes)

Rejected alternatives

  • Keep plan/apply with simpler flags: Still risk-flattening. The core problem is that a generic reconciler treats “reboot recovery” and “add a new disk” as the same kind of operation.
  • Separate init-disk + apply: The original approach. Created an artificial code boundary that was hard to explain and required ceremony for the common case.

Consequences

  • Five commands instead of three (no init-disk, no plan, no apply; remove split into remove + remove-missing)
  • Dry-run/confirmation coverage is a command category, not a blanket guarantee. The pool/LUKS-lifecycle mutators (add, remove, remove-missing, replace, unlock, lock, enroll, recover) support --dry-run, while discover previews by default and commits with --write. --yes is scoped to the confirmation-gated mutations (add, remove, remove-missing, replace) for scripting. Reactive notification-state maintenance (ack) and internal systemd-invoked paths (scrub-*) are deliberately excluded – they are reversible/self-correcting or machine-contract commands where a dry-run preview adds no operator value.
  • Tab completion returns disk names from pool.json

Decision: Group-based mount point permissions

Context

When braid mounts the pool at /mnt/storage, the btrfs root is root:root 0755. Regular users can’t write — blocking rsync, cp, and Samba workflows. NAS users need group-level access to the mount root without running everything as root.

Decision

The NixOS module declares a storage group and emits the Unix group name in runtime config as pool_access_group. Rust dispatch reconciles mount point permissions (root:<group> 2770) from mark_online after every braid command that results in a mounted pool.

Why group-based (not ACLs, not per-user subvolumes)

  • Group + setgid is the simplest model that covers the NAS use case: a set of trusted users who all need read/write access to the same pool.
  • ACLs add complexity and tooling requirements (getfacl/setfacl) without benefit for the typical home NAS.
  • Per-user subvolumes solve a different problem (isolation), not shared access.
  • This matches the pattern used by TrueNAS and OpenMediaVault.

Why module config drives it

Mount point permissions are OS-level access policy, so the NixOS module owns the group name and whether the fixup is configured. Rust dispatch executes the fixup because it already holds /run/braid-pool.lock through post-mount lifecycle work, which keeps permissions synchronized with the same mounted-pool state that drives braid-online.service.

Why Rust dispatch executes the fixup

The shell wrapper is a pure exec shim that injects tool packages onto PATH. mark_online (cli/src/online_state.rs) applies chown root:<group> + chmod 2770 on the mount point after successful mount-producing commands (unlock, add, recover).

Properties:

  • Explicit – runs synchronously after the mounted-pool command succeeds, before control returns to the caller
  • Covers all mount pathsbraid unlock (direct CLI, systemd service, auto-unlock), braid add (bootstrap), and braid recover (recovery) all go through Rust dispatch
  • No async race – unlike a systemd ExecStartPost or path watch, the fixup completes before the caller sees success
  • Idempotent – permissions persist in btrfs metadata; re-running is a no-op
  • Failure-tolerant – warns to stderr if chown/chmod fails; never overrides the wrapped command’s exit code

Why storage as default group name

  • Standard NAS convention (TrueNAS, OMV use similar names)
  • No collision with existing NixOS system groups
  • Configurable via braid.poolAccessGroup; set to null to disable entirely

Scope

This sets ownership and mode on the mount root directory only. It does NOT:

  • Override per-file permissions (files created with restrictive umask remain restrictive)
  • Provide a complete multi-user collaboration model
  • Manage ACLs or sub-directory policies

The setgid bit (2770) ensures new files/directories in the mount root inherit the storage group, but the owning user’s umask still controls the group-write bit on individual files.

See

  • cli/src/online_state.rsmark_online permission fixup
  • modules/braid/options.nixpoolAccessGroup option definition
  • Sane defaults – philosophy on opinionated defaults

First-Class Alerts for Disk Health

Context

Synology NAS boxes beep when a disk develops bad sectors — you hear it, SSH in, and deal with it. Without active alerting, a braid NAS user has no idea anything is wrong unless they happen to run braid status.

Decision

Alert as primary domain concept

braid has first-class Alerts. An Alert represents “something happened that needs human acknowledgment.” Beeping is one notification mechanism for an active alert. braid status is the primary surface for understanding alert details. braid ack acknowledges current alerts and silences notifications.

Shared alert computation

A single shared computation produces an AlertState consumed by all surfaces — braid monitor (exit code), braid status (banner + causes), TUI (banner + indicators). No surface re-encodes alert logic.

Alert causes

AlertCause is an explicit enum:

  • BtrfsDeviceErrors { devid } — non-zero btrfs device stat counters above acked baseline, excluding alert-local missing devids
  • MissingDevice { devid } — device missing from pool
  • SmartdAlert — smartd SMART health warning
  • ComputationError { detail } — probe or parse failed before a structured cause could be determined

The status banner is cause-neutral (“disk health issue detected”); cause details appear below it and in JSON output.

Two detection sources, one alert model

braid owns btrfs device stats + missing device detection. smartd owns SMART monitoring and writes a flag file (/var/lib/braid/smartd-alert) when triggered. The shared computation checks btrfs stats, missing devices, and smartd.

All five btrfs device stat counters trigger alerts

write_io_errs, read_io_errs, flush_io_errs, corruption_errs, generation_errs. Any non-zero counter above the acked baseline triggers an alert for a present recognized devid. Devids in the alert-local missing set are excluded from BtrfsDeviceErrors and alert through MissingDevice instead.

Two kernel paths feed those counters: ordinary I/O and scrub. Scrub records read, checksum, and generation failures by incrementing BTRFS_DEV_STAT_READ_ERRS, BTRFS_DEV_STAT_CORRUPTION_ERRS, and BTRFS_DEV_STAT_GENERATION_ERRS in reference/linux/fs/btrfs/scrub.c:985-993. The monitor polls the same device stats either way, so scrub-discovered uncorrectable errors reach the operator through the same BtrfsDeviceErrors cause and beep as everyday I/O errors; a separate scrub-status alert probe would be redundant with this pipeline.

Latched alerts

Alerts persist until braid ack — even if the triggering condition disappears. This means “something happened that needs acknowledgment,” not “something is currently true.” This avoids cross-source bugs where one source clearing could hide another source’s alert, and matches Synology UX.

Ack snapshots gating inputs before probing

cmd_ack reads the alert latch, the smartd flag (smartd-alert), and the ack cleanup-pending sentinel (alert-cleanup-pending) once at function entry, before probe_pool_alerts. Every decision in that ack – the gate that decides whether to proceed, the cleanup-only retry branch, and the cleanup that removes alert files – references that single snapshot. If the sentinel is the only live signal, cmd_ack runs a cleanup-only retry branch before probe_pool_alerts so recovery does not depend on probe success and does not rewrite the acknowledged baseline. The alert probe is devid-keyed and intentionally does not depend on LUKS UUID identity or pool FSID. The pool lock at /run/braid-pool.lock already serializes monitor vs ack vs add/remove writers, but the smartd hook is intentionally unlocked, so a per-ack snapshot is the only mechanism that gives ack a coherent view of smartd state.

The smartd flag is cleared during cleanup when either the snapshot observed the flag active or the snapshot’s latch carried a SmartdAlert cause. The first arm covers the normal “flag present, ack silences it” case. The second arm is an explicit exception for the crash-recovery case where a prior cycle latched SmartdAlert but the flag was already absent at snapshot, such as a partially-applied earlier ack, manual state, or filesystem-level divergence. The user’s ack is aimed at the latched smartd source, so a flag that the smartd hook writes during the probe is part of that source and is cleared.

A flag that exists at cleanup time when the snapshot saw neither active smartd state nor a latched SmartdAlert cause arrived after the snapshot and is left in place: the next monitor cycle is responsible for latching it cleanly.

Ack state keyed by btrfs devid

Acked baselines are keyed by btrfs devid (acked-stats.json maps stringified devid to baseline) – no path or LUKS UUID mapping is required to associate a stats row with its baseline. The parser captures missing device devids from MISSING sentinel lines.

Membership cross-reference is performed at the alert-pipeline boundary, not at the baseline-keying level. AlertPoolState::recognized_devids (in cli/src/probe.rs) returns the union of present_devids, null_underlying, and missing_devids for the current cycle. Both compute_alert_state and snapshot_current filter btrfs device stats rows against that set before emitting causes or writing baselines. A stats row whose devid is outside the recognized set is treated as transient/stale identity: it cannot latch BtrfsDeviceErrors, and braid ack does not persist a baseline for it, which prevents a loop on the next monitor cycle’s reconcile_acked_stats prune.

Within the recognized set, compute_alert_state also skips rows whose devid is in the alert-local missing set (missing_devids plus null_underlying). Those rows alert through MissingDevice, not BtrfsDeviceErrors, regardless of the device string btrfs printed. snapshot_current still records recognized rows by devid before layering missing_acked = true from the missing set, so a returning member does not re-alert on stale counters already acknowledged while missing.

Ack state separate from pool.json

Different concerns (identity vs acknowledgment), different write patterns, different risk profiles (precious vs disposable). Stored at /var/lib/braid/acked-stats.json.

Ack state is machine-local

On a new machine, acked state doesn’t exist — everything evaluates fresh.

braid monitor is a pure detector

Checks state and returns an exit code. Does not start/stop services. The systemd wrapper starts the beeper on exit 1.

Exit codes:

  • 0 – ok, pool offline with no active alerts, or pool-lock-contended cycle (silently skipped; re-evaluated on the next timer tick)
  • 1 – alert active (disk health issue OR indeterminate state latched as ComputationError – e.g. probe failure, parse failure, unmapped device)
  • 2 – pre-cmd_monitor setup failure (e.g. pool-lock I/O, config load failure). Reserved for “could not even attempt to detect”; never emitted by cmd_monitor itself.

Fail closed: any failure inside cmd_monitor that leaves pool state indeterminate latches a ComputationError cause and reports exit 1, so the systemd wrapper starts the beeper. Exit 2 means the monitor never ran – a beep would be meaningless because there is no AlertState to report.

Alert-state mutators are serialized by /run/braid-pool.lock. Every command that writes acked-stats.json or alert-latch.json (monitor, ack, add, remove, remove-missing) acquires /run/braid-pool.lock in Rust dispatch (see ADR 026) before reading state or running probes. This is intentionally the same lock used by pool mutators: monitor and ack perform read-modify-write cycles around subprocess I/O, while add/remove/remove-missing prune acked baselines as membership changes. Sharing one lock keeps “baseline and latch clear” authoritative and prevents stale monitor snapshots from resurrecting acknowledged alerts.

Mount presence is read from /proc/self/mountinfo via mount_check::fstype_at_mount_via_fs, not from findmnt. A readable, well-formed mountinfo file with no entry for the configured mount point is legitimate PoolOffline and exits 0. Any mountinfo I/O failure, malformed line, or duplicate target entry is indeterminate state: it surfaces as ProbeError::MountInfo, latches ComputationError, exits 1, and starts the beeper.

Self-heals stale ack state (resets missing_acked for now-present devids after drive replacement).

Periodic one-shot, not daemon

systemd timer + oneshot service. No mount condition on the timer — braid monitor handles pool-not-mounted gracefully (exit 0).

On by default

braid.monitor.enable defaults to true when braid.enable is true. beep/pcspkr failures are silently swallowed.

Audible doctor beep is opt-in

Plain braid doctor reports the alert-beep check as skipped after confirming beep monitoring is configured. braid doctor --beep runs the canonical braid-beep-probe wrapper so operators can test the real alert sound on purpose. braid doctor --json always skips the audible probe, and --json conflicts with --beep at parse time, so machine-readable output has no audible side effects.

Latch as append/refresh log

The alert latch is an append/refresh log of all unacked causes from all sources. Each monitor cycle loads the existing latch, computes new causes, and merges. Previously-latched causes that aren’t re-detected are carried forward. Newly-detected causes replace their latched counterpart (same key = fresher evidence). This ensures all cause types persist until braid ack, even if the triggering condition resolves — fixing the invariant for all sources, not just journal.

Corrupt latch recovery

load_alert_latch returns Result<Option<AlertState>, LatchLoadError> so callers can distinguish three outcomes: file absent (Ok(None), normal – no active alerts), I/O failure (Err(Read)), and unparseable on-disk content (Err(Parse)). Each caller picks its own fail-closed policy:

  • cmd_monitor is the only path that mutates the latch. On read/parse failure it quarantines the bad bytes by linking alert-latch.json to alert-latch.json.corrupt and then removing the live path, then writes a fresh latch containing a loud ComputationError cause whose detail names the failure. Quarantine uses hard_link + remove_file (not rename) so an already-existing sidecar is detected atomically by link(2)’s EEXIST; when that happens, the first sidecar is preserved as the highest-value forensic snapshot and the new corruption is surfaced only in the ComputationError detail. Any I/O failure during quarantine is folded into the same detail rather than silently dropped. The corruption signal is folded into a single ComputationError (not appended as a second cause), because merge_into_latch collapses every ComputationError into one slot via same_cause_key – appending two would silently drop one.
  • cmd_status is the read-only surface: resolve_alert_state surfaces a corrupt latch as a ComputationError cause but never moves the file (status must not mutate state).
  • cmd_ack treats a corrupt latch as an active alert for gating purposes — otherwise a genuinely unmounted ack would refuse with PoolNotMounted and the user would have no way to clear a corrupt file with the pool offline. Mounted ack and genuinely unmounted ack clean up both alert-latch.json and the .corrupt sidecar. A foreign fstype at the configured mount point is a probe error, not offline ack, and preserves the unreadable latch bytes.

This preserves “latched until ack” even when the on-disk state is unreadable: the operator sees a loud ComputationError, the bad bytes are preserved for forensics until an ack path that can safely clean them up, and ack succeeds for mounted or genuinely unmounted pools.

Cleanup ordering and retry-on-failure

Ack cleanup preserves three invariants. First, the beeper stop hook is attempted before any fallible cleanup operation; the hook is best-effort, so the invariant is that the stop attempt runs, not that sound was proven stopped. Second, destructive removals run in smartd-alert -> alert-latch.json -> alert-latch.json.corrupt order, so the corrupt sidecar leaves last and the forensic guarantee above is preserved across cleanup failures. Third, ack writes alert-cleanup-pending after the stop hook and before the first destructive step, then clears it only after the last destructive step succeeds.

CleanupFailed recovery has two cases. If creating alert-cleanup-pending itself fails, no destructive removal has run, so the original entry signals are byte-identical and the retry is driven by the normal ack path. If marker creation succeeded and a later step failed, the sentinel remains on disk. cmd_ack consults that sentinel before probing; when it is the only live signal, the hoisted cleanup-only branch reruns cleanup without probe_pool_alerts, without runner requests, and without rewriting acked-stats.json. Either path makes re-running braid ack after fixing the I/O fault genuinely idempotent.

Offline ack policy

braid ack works with the pool locked, but only when the pool is genuinely unmounted: /proc/self/mountinfo has no entry for the configured mount point. If the configured mount point is occupied by a non-btrfs filesystem, cmd_ack returns ProbeError::NotBtrfs; it must not clear alert-latch.json, remove smartd-alert, create or rewrite acked-stats.json, stop the beeper, or quarantine corrupt latch bytes.

For genuine offline ack, the persistence layer has an asymmetry by cause type:

  • MissingDevice { devid } – offline ack reads the latch and applies missing_acked = true to that devid in acked-stats.json (insert-or-update; existing device_stats baselines are preserved). The next mounted monitor cycle suppresses the cause, and reconcile_acked_stats self-heals missing_acked back to false if the device returns.
  • BtrfsDeviceErrors { devid } – offline ack refuses with an actionable error (“cannot ack btrfs device errors while pool is offline – unlock the pool first”). The counter baseline that suppresses re-firing is the current output of btrfs device stats, which requires a mounted pool. Refusing the whole ack (not partial-acking other causes) avoids leaving the operator in an “I acked but it still says ALERT” state.
  • SmartdAlert – offline ack removes the smartd flag file (the authoritative trigger source); no acked-stats.json write is needed.
  • ComputationError – offline ack removes the latch; the cause re-fires on the next monitor cycle only if the underlying computation still fails.

Coupled to the asymmetry: offline ack only loads acked-stats.json when at least one MissingDevice cause is latched, so an unrelated corrupt acked-stats.json cannot block an offline ack of a pure SmartdAlert or ComputationError latch. When acked-stats.json is loaded (a MissingDevice cause is being applied), the fail-closed load_acked_stats_fallible is used so corrupt files are propagated as I/O errors rather than silently overwritten – matching the policy in drop_ghost_acked_for_devids.

Acked-stats hygiene across pool membership changes

btrfs allocates new devids as last_devid + 1 (kernel: fs/btrfs/volumes.c, find_next_devid), so a remove-then-add sequence reuses the removed devid only when that devid was the current maximum at remove time. Removing a non-max devid leaves a permanent gap. A stale acked-stats entry for a reused devid would otherwise carry the previous holder’s device_stats baseline (suppressing health alerts until counters exceed the ghost) or its missing_acked = true flag (suppressing missing-device alerts) onto the fresh disk.

Invariant: a reused devid must never inherit the previous holder’s ack baseline.

Three layers enforce it:

  1. Add-time guard (correctness boundary): cmd_add clears acked-stats unconditionally on bootstrap and drops the assigned devid per-disk inside the live-pool add loop. cmd_recover, when finishing an interrupted add, mirrors both: bootstrap-recovery calls remove_acked_stats, and live-add recovery drops every journaled target’s devid (per-arm after a replayed pool_add_device, and via a final sweep when the target was already live at recovery entry – the committed-but-closed crash window). Cleanup failure here is command-fatal in both cmd_add and cmd_recover: the error names the stage and instructs the user to delete the file before relying on alerts.
  2. Remove-time prune (hygiene): cmd_remove and cmd_remove_missing drop the affected devid on success. cmd_recover mirrors the prune for committed removes only – the Remove guard may restore a target whose eviction did not complete, in which case its acked-stats entry is a legitimate baseline that must survive. Cleanup failure here is non-fatal (warning) – the next add for that devid will catch it via layer 1. The cmd_remove planner enriches the journaled pre_membership with the target’s live btrfs devid so recovery can resolve it after a discover-time pool.json.
  3. Monitor reconcile (defense-in-depth): cmd_monitor prunes orphan entries (devid no longer in pool.present_devids, pool.null_underlying, or pool.missing_devids) every cycle. This catches crash recovery and manual btrfs operations performed outside braid. It cannot detect ghost data once a devid is reused, so the add-time layer is the boundary for that case. The read itself uses load_acked_stats_fallible so a corrupt or unreadable acked-stats.json latches ComputationError instead of silently re-firing acked causes against an empty baseline, matching offline ack and drop_ghost_acked_for_devids. A save failure during reconcile latches the same ComputationError so a persistent FS write fault (EROFS, ENOSPC, or EACCES on acked-stats.json or its parent) surfaces via exit-1 beep rather than accumulating only in journald.

Backstop: independently of those three layers, the alert computation fails loud when the acked baseline is no longer comparable to the current counter stream. compute_alert_state treats an acked counter that exceeds the current btrfs device stats value as 0 and alerts on any nonzero current. btrfs device-stats counters are persistent and monotonic (reset only by -z, which braid never runs), so the only ways a current value can sit below the ack baseline are a reused devid that inherited a ghost baseline before add/recover cleanup dropped its acked entry (the committed-but-closed crash window above), or an operator resetting the live counters with btrfs device stats -z. The three layers aim to remove a stale baseline; this guard ensures that if one transiently survives, it cannot suppress a later nonzero counter.

Rejected alternatives

  • Daemon-based monitoring: more complex lifecycle management for no benefit over a timer + oneshot
  • Storing alerts in a database: unnecessary complexity; file-based flag + JSON is sufficient
  • Per-surface alert logic: each surface re-checking btrfs stats independently would lead to inconsistencies
  • Counter-based thresholds (e.g., alert after N errors): any non-zero counter above baseline is worth investigating; thresholds delay detection
  • Kernel journal scanning: originally implemented as a supplementary alert source scanning journalctl -k for “BTRFS error” messages. Removed because btrfs commits every 30 seconds, which increments device stats counters for any disk error within that window. The 5-minute monitor poll catches those counters reliably. Journal scanning was redundant with device stats and added significant complexity (cursor tracking, regex parsing, crash-safe cursor ordering, latch merge logic). Repro VMs in tests/repro/kernel-journal-* preserve the empirical evidence from the original investigation.

Decision: HDD defaults

Principle: HDD defaults

Context

braid manages a NAS pool of LUKS-encrypted btrfs RAID1 drives. The typical deployment is bulk storage on large-capacity spinning drives (e.g., 12–16 TB HDDs). Several defaults already assume rotational media:

  • cryptsetup open omits --allow-discards, so TRIM/discard requests from btrfs never reach the underlying device. btrfs also exposes a mount-layer discard knob (discard=async, the kernel default since 6.2 on devices that advertise discard support), but braid’s LUKS layer gates it: without --allow-discards, the mapped device never reports discard support upward, so the kernel default never activates and any explicit discard=async would be silently dropped.
  • noatime avoids relatime’s read-triggered metadata write-amplification on every RAID1 copy.
  • Monthly scrub interval is tuned for spinning disk wear and noise.

Making braid flash-aware would mean adding --allow-discards (with its security tradeoff of leaking block-usage patterns through the encryption layer), flash-specific scrub/balance scheduling, and flash-targeted test coverage. None of this is warranted for the target use case.

Note: braid already handles flash media in its monitoring paths — NVMe SMART parsing (cli/src/parse/smartctl.rs) and transport-type detection (cli/src/tui/probe.rs) work with any drive type. This decision is about operational defaults, not monitoring.

Decision

Defaults are chosen for HDD NAS deployments. Flash media (SSDs, NVMe, USB sticks) may function but are not a validated or optimized target.

Tradeoffs accepted

  • No TRIM passthrough — braid pins discard off at the LUKS layer by omitting --allow-discards and, by consequence, at the btrfs mount layer because no effective discard=async can pass through regardless of kernel default. SSDs used with braid experience increased write amplification and performance degradation over time.
  • No flash-specific testing — flash-related issues in LUKS or mount configuration may go unnoticed.

Alternatives considered

Default-on btrfs compression (compress=zstd:1)

Rejected. braid targets HDD bulk-storage NAS pools where the dominant content is media and archives: video, audio, photos, and other formats already compressed at the application layer. Transparent filesystem compression usually saves little or no space on that mix, while the btrfs heuristic that skips incompressible extents still spends CPU on each write. On low-power NAS hardware, that cost is not free.

Reversal is also partial. Removing compress=... affects future writes only; extents already written compressed stay compressed until the data is rewritten or explicitly defragmented. Making compression the default would bake that conversion cost into pre-v1.0 software for users who later discover their workload does not benefit. For that reason, cli/src/cmd.rs base_mount_options() intentionally omits compression.

Fedora’s compress=zstd:1 precedent is workstation root filesystems on SSDs: binaries, logs, configs, and package payloads. That precedent does not transfer cleanly to HDD bulk-storage NAS workloads. Users with compression-friendly data, such as text, code, or document servers, can opt specific paths into compression today with btrfs property set <path> compression zstd; reference/btrfs-progs/Documentation/btrfs-property.rst documents this modern per-inode interface. This is preferable to legacy chattr +c, which uses ext2-style flags and defaults to zlib. No braid feature gate is needed for this per-path opt-in.

See

  • cli/src/cmd.rsCryptsetupLuksOpen and CryptsetupLuksOpenKeyFile omit --allow-discards
  • cli/src/cmd.rsbase_mount_options() omits any discard option, relying on the kernel default that is itself gated by the LUKS layer
  • cli/src/cmd.rs#base_mount_options – sets noatime to avoid relatime’s read-triggered metadata write-amplification on RAID1.
  • Sane defaults — scrub interval tuned for spinning disks
  • ADR 031: Drive-wake posture – mounted drives are treated as awake; noatime is not spindown management.

Auto-Suspend via autosuspend + braid idle

Context

HDDs in a btrfs RAID1 NAS can’t rely on per-drive spindown — btrfs periodic commits (every 30s), smartd polling, and braid-monitor health checks wake drives frequently. The user wants the NAS to be quiet and low-power when not in use, and responsive when needed.

Scope note: this decision governs whole-system suspend-to-RAM (S3). Its per-drive spindown context explains why braid chose system suspend for mounted NAS idle behavior; it does not preclude a future opt-in per-drive braid.autoSpinDown that parks drives only while the pool is locked. See ADR 031: Drive-wake posture.

Decision

Whole-system suspend-to-RAM

The entire NixOS machine suspends when idle. This preserves LUKS keys and the mounted btrfs pool in RAM — no re-unlock ceremony on wake. Drives stop, CPU stops, fans stop. Wake via Wake-on-LAN or RTC alarm.

autosuspend as the daemon

autosuspend is an existing Python daemon in nixpkgs that handles idle countdown, periodic activity checks, and RTC wakeup scheduling. When the host is idle, it executes the configured suspend command (typically systemctl suspend). systemd/logind then applies the actual sleep request semantics, including honoring active high-level sleep inhibitor locks. Writing a custom daemon for this would reimplement what autosuspend already does well.

braid configures autosuspend via the existing NixOS module (services.autosuspend). The user writes braid.autoSuspend.enable = true; and gets sensible defaults.

braid idle as the btrfs check

A separate CLI command (braid idle) checks for an in-flight scrub plus any kernel exclusive operation (balance, balance paused, device add, device remove, device replace, resize, swap activate). The exclusive-operation states are read from /sys/fs/btrfs/<fsid>/exclusive_operation – the same source preflight.rs uses for mutating commands – so the two code paths cannot disagree about what counts as busy. Scrub is read separately via btrfs scrub status because scrub is not in the kernel’s exclusive-operation set (see reference/btrfs-progs/common/utils.c:1188-1197). autosuspend calls braid idle via ExternalCommand check.

Why a separate command rather than inline shell in autosuspend config:

  • braid already has the parser for btrfs scrub status and the sysfs read helper
  • Fail-closed behavior (probe failures map to Busy(Unknown) -> exit 1 -> block suspend; setup/config errors stay at exit 2 and also block via !) is easier to get right in Rust than in shell
  • Testable with unit tests via MockRunner + a Filesystem mock

braid wol-ready as the Wake-on-LAN check

A hidden CLI command (braid wol-ready) checks the configured braid.autoSuspend.wolInterface immediately before autosuspend is allowed to suspend the host. It runs ethtool <iface> through braid’s command runner and reuses the same WoL classifier as braid doctor, so the on-demand diagnostic and the per-suspend gate cannot drift on what counts as magic-packet armed.

Invariant: braid.autoSuspend will not automatically suspend the NAS unless braid.autoSuspend.wolInterface currently reports Wake-on: g.

The command is intentionally scoped to braid’s autosuspend path. Manual systemctl suspend remains available for admin maintenance, local testing, and machines where the operator deliberately accepts the wake risk. A universal sleep.target gate was considered and deferred because it would turn braid’s claim from “braid will not auto-suspend unsafely” into “this machine may not suspend at all,” which is a broader and more surprising ownership boundary.

Exit code inversion

braid idle and braid wol-ready follow natural Unix convention (exit 0 = success). autosuspend’s ExternalCommand convention is inverted (exit 0 = activity detected). The NixOS module bridges this with bash -c '! <command>':

braid commandbraid exitMeaningAfter !autosuspend result
braid idle0idle1allow suspend
braid idle1busy or probe failure0block suspend (fail-closed)
braid idle2setup error0block suspend (fail-closed)
braid wol-ready0Wake-on: g armed1allow suspend
braid wol-ready1not armed or unverifiable0block suspend (fail-closed)
braid wol-ready2setup error0block suspend (fail-closed)
either commandtimeoutsignal-killable overrun >10s0block suspend (fail-closed)

timeout must be inside bash -c so its non-zero overrun result is inverted by !. An outer timeout (timeout -k 2 10 bash -c '! braid idle') would fail open: bash gets killed before ! runs, autosuspend sees the non-zero timeout result and treats it as no activity. Coreutils’ timeout sends TERM at the main deadline and -k 2 escalates to KILL two seconds later for processes that ignore or delay TERM (see reference/coreutils/src/timeout.c).

Scope of the timeout invariant: this covers signal-killable command overruns (parser regression, slow userspace probe, network-FS latency). Uninterruptible kernel waits (process in D state on a wedged ioctl) are not bounded by timeout(1) and remain a separate failure mode; under that condition the autosuspend tick itself stalls until the syscall returns, so the system stays awake by virtue of not deciding.

Mount probe reads /proc/self/mountinfo directly

braid idle’s initial mount-presence check (is_btrfs_mounted) reads /proc/self/mountinfo via the existing Filesystem abstraction rather than shelling out to findmnt. Rationale: the mount probe is a fail-closed safety gate; any subprocess fallback path that maps “non-zero exit + empty stderr” to “no mount” reintroduces the fail-open seam this gate exists to prevent. The kernel-maintained mountinfo file gives a direct answer in one syscall, with no fork/exec.

Octal-escaped mount-point fields (\040, \011, \012, \134) are decoded before comparison so configured mount paths containing whitespace match correctly.

IO errors (file unreadable, EIO), malformed mountinfo lines, and ambiguous duplicate target entries surface as Busy(BusyReason::Unknown), exit 1, and block suspend. “Don’t know” never becomes “allow suspend”.

Exclusive-op probe scans /sys/fs/btrfs/* directly

After the mount check passes, cmd_idle reads exclusive_operation from every entry under /sys/fs/btrfs/ via preflight::check_any_btrfs_exclusive_op and returns busy as soon as any one is non-none. No findmnt or btrfs filesystem show subprocesses are invoked on this path; only the scrub probe (btrfs scrub status) remains, because scrub is not part of the kernel’s exclop_def[] set (reference/btrfs-progs/common/utils.c:1186-1194).

Semantics: any in-flight exclusive op on any btrfs filesystem on the host counts as busy. On a typical braid host (one btrfs filesystem, the pool) this is identical to a fsid-scoped check. On a host with btrfs root alongside the pool the reported BusyReason may name an op on the non-pool fs, but the suspend decision is still correct – autosuspend’s job is to err conservative, and “do not suspend while any btrfs is mid-balance/replace/etc.” is the right answer regardless of which fs is busy.

Pseudo-dir skip is by name allowlist (features, debug), not by “absorb any NotFound on read.” The kernel only creates exclusive_operation under per-fsid <uuid>/ dirs (reference/linux/fs/btrfs/sysfs.c:29-47), but treating a missing attribute on any other listed entry as “must have been a pseudo-dir” would silently swallow a real failure mode: a fsid dir whose attribute disappears mid-scan during a concurrent unmount race. Under the allowlist, that race surfaces as ExclusiveOpError::Read and blocks suspend.

Fail-closed branches: list_dir("/sys/fs/btrfs") IO errors, any read error on a non-allowlisted entry’s exclusive_operation (including NotFound), unrecognized parser values, and an empty /sys/fs/btrfs/ after the mount check passed all surface as Busy(BusyReason::Unknown) and exit 1.

The scrub probe is held to the same contract: a parse_btrfs_scrub_status result of ScrubState::Unknown (empty stdout or an unrecognized Status: word) surfaces as Busy(BusyReason::Unknown) and exits 1. Parser drift must not silently allow suspend.

probe::probe_fsid is no longer reached from cmd_idle. It remains in use by non-idle callers (lock.rs and the preflight pipelines that need a UUID for other purposes), and is out of scope for this gate.

Scrub probe is scoped to the pool mount point

Unlike the exclusive-op scan, the scrub probe is not host-wide: cmd_idle runs btrfs scrub status against only the configured pool mount point. A scrub on a non-pool btrfs (e.g. the btrfs root) is therefore not detected and does not block suspend.

This asymmetry is intentional. braid’s autosuspend gate protects the braid pool, not every btrfs on the host – the same ownership boundary that scopes braid wol-ready to braid’s suspend path rather than installing a universal sleep.target gate. The exclusive-op scan is broader only because one pass over /sys/fs/btrfs/* reads every filesystem’s state for free and errs conservative; matching that breadth for scrub would mean spawning a btrfs scrub status subprocess per filesystem on every autosuspend tick, for coverage braid does not own.

SSH always on, SMB/NFS auto-detected

SSH check is unconditional — braid requires SSH for unlock, and an active SSH session means someone is working. SMB and NFS checks are auto-detected from config.services.samba.enable and config.services.nfs.server.enable to avoid false positives on systems that don’t run those services.

smartd and braid-monitor run opportunistically

Neither smartd nor braid-monitor should wake the system or prevent suspend. They run naturally during wake windows (user access, scrub wakeup). SMART counters accumulate in drive firmware regardless of polling. The only scheduled wakeup is for the monthly btrfs scrub timer.

Paused balance = busy

A paused balance holds the btrfs exclusive-operation lock. The mutating-command preflight in preflight.rs already treats a paused balance as a hard refusal (it can block indefinitely, so braid cannot enqueue behind it). Same logic in braid idle – don’t suspend mid-pause.

WoL managed by braid

braid.autoSuspend.wolInterface is required when sleep is enabled. braid sets networking.interfaces.<iface>.wakeOnLan.enable = true on the specified interface. A build-time assertion prevents enabling sleep without WoL – otherwise the NAS suspends and becomes unreachable until someone physically presses the power button. braid doctor verifies the live NIC reports magic-packet wake (Wake-on: g) for that interface on demand, and autosuspend also runs the hidden braid wol-ready check every suspend cycle. The BIOS-side WoL setting is the user’s responsibility (can’t be automated from NixOS).

Some drivers can reset WoL after resume. braid does not currently re-arm WoL from a system-sleep hook; instead, the autosuspend gate keeps the machine awake after the first wake if Wake-on: g disappears. That is the safe degraded direction: visible and diagnosable via braid doctor, rather than silently sleeping into an unreachable state.

Fully qualified store paths

The ExternalCommand command strings use absolute /nix/store/ paths for timeout, bash, and braid. autosuspend runs the commands outside braid’s wrapper, so PATH is not guaranteed to include these tools.

See

Active – Supersedes 002-config-first-workflow.md. Refined by 024-luks-uuid-identity.md.

Decision: Runtime Disk Membership

Principle: CLI-owned membership

Context

The original design declared disk membership in braid.disks (NixOS config). Adding a drive required editing Nix config, running nixos-rebuild switch, then running braid add <name>. This was wrong: disk membership is operational state (“which drives are in my pool right now”), not system architecture (“what services should run on this machine”). Requiring a rebuild to add a drive added ceremony and created a category error — NixOS config is for declarative system shape, not mutable runtime state.

Decision

Move disk membership to a CLI-owned runtime state file. The NixOS module provides infrastructure (mount point, services, toolchain). The CLI owns which disks are in the pool.

State model

/var/lib/braid/pool.json — CLI-owned membership keyed by LUKS UUID:

{
  "disks": {
    "11111111-1111-1111-1111-111111111111": {
      "name": "toshiba",
      "by_id": "/dev/disk/by-id/ata-TOSHIBA_...",
      "devid": 1,
      "added_at": "2026-03-27T12:00:00Z"
    }
  }
}

The map key is the member’s persistent identity. The name field is the operator-facing disk name used in commands, mapper names, and labels; it is not the identity. by_id is the hardware address used to find the disk before it is opened. devid is live btrfs state captured after membership commits and is only a fallback binding key when btrfs reports a missing or null-underlying device by devid alone. added_at is historical state – once set on a member, it is preserved across all subsequent writes (unlock, recover, replace, add, etc.). These fields replace the former disk-map.json advisory file.

/etc/braid/config.json — machine config (no disk information):

{ "mount_point": "/mnt/storage" }

Standalone CLI installs may keep this minimal shape. Module-generated configs also include pool_access_group and systemd_lifecycle:

{
  "mount_point": "/mnt/storage",
  "pool_access_group": "storage",
  "systemd_lifecycle": true
}

/var/lib/braid/pending-op.json — pending-operation journal (transient, present only during mutations).

Mutation ordering

All mutating commands validate, write pending-op.json with pre/target membership snapshots, perform the irreversible btrfs membership change, write pool.json to reflect the committed live membership, then advance the journal to a post-maintenance phase before performing any required post-mutation maintenance and clearing the journal.

pool.json reflects committed btrfs membership, not necessarily completion of follow-up maintenance such as RAID1 rebalance or resize. While pending-op.json exists, braid recover is responsible for replaying or completing any owed post-mutation work before clearing the journal when the balance state is safe to interpret. If owed RAID1 replay finds a paused, running, or unknown btrfs balance, recover fails closed and preserves the journal for manual inspection. Recovery in a post-maintenance phase must not rerun the primary btrfs membership command (device add, device remove, or replace start).

For add, membership commits when btrfs device add returns success; the post-add RAID1 balance is follow-up maintenance. For remove, membership commits when btrfs device remove returns success; writing pool.json before that would be wrong because btrfs still owns the device. For remove-missing, membership commits when btrfs device remove <devid> against the missing devid returns success; the post-remove soft balance that restores RAID1 redundancy for chunks created during degraded operation is follow-up maintenance. For replace, membership commits when btrfs replace start -B completes; the post-replace resize, and (for missing-path replacements that clear the last missing device) the soft balance, are follow-up maintenance.

The journal provides crash safety: if braid crashes mid-operation, the journal triggers recovery mode on next invocation. If a crash lands after pool.json was written but before the post-maintenance phase rewrite, braid recover detects the committed live topology, rewrites the journal to the post phase, and then finishes only the owed maintenance unless owed RAID1 replay finds a paused, running, or unknown balance state.

Recovery mode

When pending-op.json exists, braid enters recovery mode. Membership, mount, and key-enrollment commands (add, remove, remove-missing, replace, unlock, enroll, discover --write) hard-fail; read-only diagnostic and cleanup surfaces (status, doctor, lock, bare discover) stay available. braid recover is the only command that clears the journal: it opens LUKS devices, mounts the pool (with --allow-degraded if needed), and rebuilds or repairs membership from the live btrfs pool topology – not from LUKS label scanning, which could include labeled-but-never-added disks.

State contract

  • pool.json is authoritative. unlock requires it.
  • unlock enriches pool.json metadata (devid, added_at, and current by-id observations where appropriate) after mount via live btrfs state, but never changes membership (disk set).
  • If pool.json is missing or corrupt, unlock and the mutating membership commands fail with a clear error directing the user to braid add or braid discover --write.
  • braid lock – the user-facing command, the braid-online.service ExecStop reentry, and braid lock --dry-run – tolerates a missing or corrupt pool.json: it warns and proceeds with empty membership. The per-candidate cryptsetup luksUUID probe in build_close_sets_* (cli/src/lock.rs) is the fail-closed guard, so cleanup remains complete and correct. No lock pathway hard-fails on an unloadable pool.json; dry-run folds the warning into its stdout preview while the real paths emit it to stderr (see ADR 026).
  • If pool.json is readable but stale (a member fails to probe), unlock warns and proceeds with the members it can probe. It never rewrites pool.json.
  • If a member’s UUID key doesn’t match the probed device’s LUKS UUID, unlock fatally errors. This catches swapped, reformatted, or corrupted drives before any LUKS open or mount is attempted.
  • Only these commands write pool.json membership: add, remove, replace, remove-missing, discover --write, recover.

Recovery

Recovery is always explicit, never implicit:

  • braid recover opens LUKS devices and mounts the pool if needed. Mount membership is phase-specific: existing-pool add and remove-missing pool-mutation phases mount from the pre-operation membership, add/remove-missing post phases and replace post-maintenance recovery mount from the committed target membership, and replace pool-mutation, bootstrap-add pool-mutation (empty pre-operation snapshot), and plain remove recovery mount from the admission membership (pre-operation snapshot plus target-only members, which for replace covers an in-flight dev_replace). This is the only path out of recovery mode (journal present). It probes actual pool topology, not LUKS labels. Each live member’s by_id is resolved at recovery time by walking /dev/disk/by-id/ and matching the symlink whose canonical target equals the live device’s backing kernel path – by_id is never copied from the journal snapshot, which can be stale if hardware enumeration changed since the mutation started. If no by-id symlink resolves to a live pool member, recovery hard-fails with an actionable remediation message rather than persisting a guess. When rebuilding pool.json, recover preserves each member’s added_at from the current pool.json if present, else from the journal’s pre/target membership snapshot; only members with no prior timestamp get a fresh now_iso() stamp. by_id, the UUID key, and devid remain live-derived or journal-verified according to the recovery phase. When the pool is already mounted by an external process (circumventing braid unlock’s pending-op preflight) and the journal records Replace::PoolMutation, recovery refuses and directs the operator to braid lock; braid recover so a fresh mount session can be opened and the relock cycle can clear any kernel-resumed-dev_replace staleness. Replace post-maintenance recovery is allowed on an already-mounted pool because the primary replace has already committed.
  • braid discover scans /dev/disk/by-id/* for LUKS devices with braid-* labels. Displays what it finds. With --write, persists to pool.json. This is for initial setup recovery (lost pool.json), not for crash recovery.
  • The normal path to create pool.json is braid add.

CLI syntax

braid add takes name=by_id positional pairs:

braid add toshiba=/dev/disk/by-id/ata-TOSHIBA wd=/dev/disk/by-id/ata-WDC

braid replace --new takes the same format:

braid replace --old toshiba --new seagate=/dev/disk/by-id/ata-Seagate_NEW

Lifecycle model

The NixOS module no longer generates data-pool fileSystems, LUKS entries, or btrfs-device-scan. Instead:

  • braid-online.service — lifecycle owner (ExecStop=braid lock, RemainAfterExit=yes). Started by Rust dispatch via mark_online after a successful unlock, add, or recover that leaves the pool mounted, gated on systemd_lifecycle = true in runtime config.
  • braid-pool.target — wants unlock only, does not start braid-online directly.
  • Consumer services bind to mnt-storage.mount (auto-generated by systemd from /proc/mounts).

Rejected alternatives

  1. Keep braid.disks but make it optional — half-measure that leaves two sources of truth. Users would be confused about which one matters.
  2. Auto-discover on unlock — makes unlock a mutation command. If discovery finds the wrong devices (e.g., a test disk with a braid-* label), the pool is corrupted silently. Explicit membership is safer.
  3. Store membership in btrfs metadata — btrfs doesn’t have a user-data field on devices. Would require a convention (e.g., subvolume with a JSON file), adding fragility and a chicken-and-egg problem for unlock.

Consequences

  • Adding a drive is one command: braid add name=/dev/disk/by-id/.... No nixos-rebuild.
  • pool.json must exist before unlock can run. First-time setup: braid add creates it.
  • braid discover --write is the explicit recovery path for lost/corrupt pool.json.
  • The NixOS module’s braid.disks option is removed entirely.

See

  • cli/src/membership.rs – load/save/validate membership, DiskMember, PoolMembership, enrich_from_pool_state, foreign_luks_uuids (pure helper consumed by braid doctor’s foreign_luks_uuid check)
  • cli/src/journal.rs — pending-operation journal (pre/target membership snapshots)
  • cli/src/recover.rs — rebuild membership from live pool state
  • cli/src/preflight.rscheck_no_pending_operation recovery mode guard
  • cli/src/discover.rs — LUKS label scanning
  • modules/braid/storage.nixbraid-online.service, no data-pool fileSystems
  • modules/braid/options.nix — no braid.disks

Decision: Systemd Lifecycle State Machine

Principle: Resilient by default

Context

braid needs systemd integration for three things: interactive unlock, unattended unlock, and clean shutdown (LUKS close before power-off). The module must not generate data-pool fileSystems or boot.initrd.luks.devices entries — those create hard boot dependencies on the data pool (see 003-resilient-boot.md). Instead, the CLI owns LUKS open/close and btrfs mount/unmount at runtime, and a thin systemd layer provides the entry points and shutdown hook.

Units

                          ┌─────────────────────┐
                          │  braid-pool.target   │  entry point
                          │  wants + after       │
                          └─────────┬────────────┘
                                    │ (soft dep)
                          ┌─────────▼────────────┐
                          │  braid-unlock.service │  interactive passphrase
                          │  oneshot              │
                          └─────────┬────────────┘
                                    │ (CLI marks online on success)
                          ┌─────────▼────────────┐
                          │  braid-online.service │  lifecycle owner
                          │  ExecStart=/bin/true  │
                          │  ExecStop=braid lock  │  --systemd-stop
                          │  oneshot, RAE         │
                          └──────────────────────┘

  braid-auto-unlock.service          (alternative unlock path, boot-time)
  wantedBy multi-user.target          activates braid-online via same CLI path

  mnt-storage.mount                   (auto-generated by systemd from /proc/mounts)

  braid-monitor.timer -> braid-monitor.service -> braid-alert.service
  ConditionPathIsMountPoint            (health polling, skipped when pool not mounted)

  braid-scrub.timer -> braid-scrub.service
  braid-online.service -> braid-scrub-resume-trigger.service -> braid-scrub.service
  BindsTo + After braid-online.service    (lifecycle-bound periodic scrub)
  Persistent=true                          (catch-up on activation)

RAE = RemainAfterExit = true

braid-pool.target — entry point

Public handle for “bring pool online.” User runs systemctl start braid-pool.target.

  • wants (not requires) braid-unlock.service — soft dependency. Unlock failure does not fail the target, and the target cannot block boot because nothing requires it.
  • after braid-unlock.service — ordering only.
  • Does not want or require braid-online.service. The CLI activates that separately after confirming the mount succeeded.

braid-unlock.service — interactive passphrase unlock

Single orchestrator: opens all LUKS devices and mounts the btrfs pool in one shot. Guarantees exactly one passphrase prompt (avoids relying on systemd-ask-password cache behavior across multiple LUKS units).

  • Type = oneshot — runs once, returns to inactive on completion. ConditionPathIsMountPoint (below) prevents re-run while mounted; the inactive state allows systemctl start braid-pool.target to re-unlock after a prior braid lock.
  • ConditionPathIsMountPoint = !${mountPoint} — skips if pool already mounted.
  • Calls systemd-ask-password --timeout=0 --id=braid | braid unlock --passphrase-stdin.

braid-auto-unlock.service — unattended USB keyfile unlock

Optional (only created when braid.autoUnlock.enable = true). Runs at boot, unlocks from a USB keyfile without interactive prompt.

  • wantedBy = [ "multi-user.target" ] — starts automatically at boot.
  • after = [ "local-fs.target" ] — waits for /run to exist.
  • ConditionPathIsMountPoint = !${mountPoint} — skips if pool already mounted.
  • No RemainAfterExit — intentional. If USB is absent at boot (service exits 0 on skip), a later systemctl start braid-auto-unlock can re-run when the USB is inserted.
  • Mounts USB read-only, validates keyfile path (symlink defense), runs braid unlock --key-file, always unmounts USB after (never leaves keyfile accessible).
  • Always exits 0 — failures are logged to the journal but never reported as unit failure, because auto-unlock must not block boot under any circumstance.

braid-online.service — lifecycle owner

State-ownership service. Its only purpose is to mark “pool is online” and run the bounded braid lock stop path on stop.

  • ExecStart = /bin/true — no work. Exists for its ExecStop hook.
  • ExecStop = braid lock --systemd-stop --deadline-secs <n> – unmounts pool and closes all LUKS on shutdown or manual stop with a bounded stop-coordinator/pool-lock wait below TimeoutStopSec. In this mode, braid permits a running or paused btrfs balance: a running balance is explicitly paused before unmount, an already-paused balance proceeds to unmount, and every other exclusive operation is refused. If the blocking btrfs balance userspace process briefly holds the mount fd after its parent dies, the systemd-stop path uses a longer transient-busy umount retry than plain braid lock.
  • RemainAfterExit = true — persists “active” state.
  • ConditionPathIsMountPoint = ${mountPoint} – systemd skips activation when the pool is not mounted (systemctl start returns 0 but the unit stays inactive). Defense-in-depth: the CLI’s mountpoint -q check is the primary gate, but this condition prevents direct systemctl start from leaving the unit active while unmounted.
  • TimeoutStopSec = 300s – raises the stop timeout from the 90s default so a slow braid lock is not SIGKILL’d mid-operation.
  • Not in any dependency chain. Neither the target nor unlock services want/require it. Activated exclusively by the CLI after mountpoint -q confirms the pool is mounted.

mnt-storage.mount — readiness contract

Auto-generated by systemd from /proc/mounts when the btrfs pool is mounted. Consumer services bind to this unit.

braid-monitor.timer + braid-monitor.service — health polling

Periodic oneshot (default: every 5 minutes). Pure detector — checks btrfs device stats for errors.

  • ConditionPathIsMountPoint — skipped cleanly when pool is not mounted (no dependency-failure noise from timer). No After or BindsTo on mnt-storage.mount — those directives force systemd to load the unit, which doesn’t exist before the first unlock.
  • Exit code 1 from braid monitor → starts braid-alert.service.
  • braid monitor fails closed: probe/parse/stats/mountinfo failures, acked-stats.json baseline read/parse failures, and alert-latch read/quarantine failures latch AlertCause::ComputationError and exit 1, so the wrapper above starts the beeper. Exit 0 is reserved for healthy, pool-offline, and pool-lock-contended cycles; exit 2 is reserved for pre-cmd_monitor setup failures (e.g. pool-lock I/O, config load failure) and is never emitted by cmd_monitor itself. See ADR 014 fail-closed contract for the cause taxonomy.
  • The gate and the fail-closed path are independent mount checks, so the gate cannot mask a real alert. ConditionPathIsMountPoint resolves through statx(STATX_ATTR_MOUNT_ROOT) (then name_to_handle_at(2), then /proc/self/fdinfo) – a kernel VFS query, never a parse of /proc/self/mountinfo text. The fail-closed path above instead parses that text and latches ComputationError on a malformed line, duplicate target, or read error. On a genuinely-mounted pool statx reports a mount root regardless of any text anomaly, so the service runs and the beep fires – the protective beep is never gated away. The gate only short-circuits a statx-confirmed-offline pool; the sole beep it suppresses is braid’s conservative ComputationError on an offline pool with anomalous mountinfo text, which is not a disk-health alert.

braid-scrub.timer + scrub service + resume trigger – lifecycle-bound scrub

Periodic scrub (default: monthly). Uses a timer-lifecycle pattern distinct from the monitor’s ConditionPathIsMountPoint-only approach.

  • Timer is wantedBy, BindsTo, and After braid-online.service. Starts when pool comes online, stops when pool goes offline.
  • Persistent=true + AccuracySec=1d. When the timer activates (pool unlock), systemd compares the last-trigger stamp against OnCalendar. If a scrub was overdue during the offline period, it fires immediately.
  • braid-scrub.service is the only foreground scrub runner. It is Type=simple; its internal braid scrub-resume-or-start --mount <mount> ExecStart resumes saved scrub progress first, then starts a fresh scrub only when btrfs reports nothing resumable.
  • braid-scrub.service uses a shared ExecStop cancel script – same pattern as the nixpkgs btrfs scrub service. This cancels in-flight scrub on lock or shutdown through btrfs scrub cancel, leaving btrfs-progs’ /var/lib/btrfs/scrub.status.<fsid> progress file available for the next resume.
  • braid-scrub-resume-trigger.service is the pool-online predicate-and-poke path. It is Type=oneshot, wantedBy, BindsTo, and After braid-online.service; it runs internal braid scrub-needs-resume --mount <mount> and starts braid-scrub.service with systemctl start --no-block only when saved progress is resumable.
  • The scrub service and resume trigger use BindsTo + After braid-online.service. On shutdown or systemctl stop braid-online.service, systemd stops them before braid lock runs.
  • ConditionPathIsMountPoint on the scrub service and trigger is defense-in-depth.
  • Serialization via single runner. Only braid-scrub.service ever runs btrfs scrub; both activation paths (timer and trigger) issue systemctl start braid-scrub.service, and systemd coalesces overlapping starts for the same unit. A completed scrub-resume-or-start run satisfies both an overdue timer fire and a pool-online resumable state, with no flock and no /run/braid-scrub.lock.
  • Conflicts + Before shutdown.target and sleep.target on the scrub service. The short-lived resume trigger also uses Conflicts + Before sleep.target so suspend setup wins cleanly against pool-online activation.

braid-alert.service — notification

Started by monitor on error detection. Beeps via PC speaker (if enabled) and/or runs a custom alert command. Stopped by braid ack.

Rust dispatch as synchronization layer

The wrapper (braid-wrapper.sh) is a pure exec shim: it sets the module-controlled PATH and execs the Rust binary. Synchronization lives in Rust dispatch (cli/src/main.rs), which owns the pool lock, braid-online.service lifecycle updates, and shutdown stop coordination. See 026-pool-lock-rust-owned.md.

modules/braid/cli.nix emits systemd_lifecycle = true for module-managed installs. Standalone CLI deployments omit it; those configs still get mount permission fixups but do not touch braid-online.service.

After every unlock, add, or recover attempt:

  1. Rust dispatch acquires /run/braid-pool.lock, loads config and membership, and snapshots braid-online.service ActiveState only when systemd_lifecycle = true.
  2. CLI opens LUKS + mounts pool when the command reaches its mount step. (recover self-mounts when recovering from an interrupted operation.)
  3. Before dispatch returns, success or failure, Rust runs mark_online while the pool lock is still held.
  4. mark_online checks mountpoint -q; pre-mount failures short-circuit here.
  5. Rust sets permissions (root:poolAccessGroup 2770) if poolAccessGroup is configured.
  6. When systemd_lifecycle = true, Rust starts braid-online.service only when the initial snapshot was inactive or failed.
  7. If activation fails: prints WARNING to stderr, then preserves the command’s original exit result. Pool is mounted and usable; only the shutdown hook is missing.

On lock:

  1. Plain braid lock acquires /run/braid-stop-coordinator.lock, then /run/braid-pool.lock.
  2. When systemd_lifecycle = true, Rust stops braid-scrub.timer, braid-scrub-resume-trigger.service, then braid-scrub.service (timer first prevents re-trigger; trigger before service prevents the trigger from queuing a fresh start of the service being stopped; service last cancels in-flight scrub).
  3. When systemd_lifecycle = true, Rust iterates systemctl show -P BoundBy braid-online.service and stops each remaining bound consumer (samba, nfs, future). The scrub units already handled in step 2 are skipped. This mirrors the cascade systemd performs on shutdown for user-initiated braid lock.
  4. CLI unmounts pool + closes LUKS.
  5. Plain braid lock writes done\n to /run/braid-stop-coordinator.lock.
  6. When systemd_lifecycle = true, Rust checks the mount is gone and runs systemctl stop braid-online.service synchronously so the command returns only after the lifecycle owner is inactive. The synchronous stop runs only when the post-cleanup mountpoint check confirms the mount is gone; if the check itself fails, Rust warns and skips the stop, leaving the unit active for the operator to retry. The recursive ExecStop reentry polls the coordinator, observes done\n, and exits 0.

On system shutdown:

  1. systemd stops braid-online.service (if active); its BindsTo+After cascade stops the scrub units and any full-triad consumer first. ExecStop then re-runs the same scrub-stop + BoundBy iteration as the “On lock” steps 2-3. For the scrub units and any consumer that follows the documented WantedBy+BindsTo+After triad, the cascade has already stopped them, so these re-issued stops are no-ops. A consumer that declares BindsTo without After has no stop-ordering guarantee and may still be active when ExecStop runs, so the explicit blocking stop here is what frees the mount. Running the pre-steps unconditionally covers both cases, keeping teardown code-owned and independent of cascade ordering.
  2. ExecStop = braid lock --systemd-stop --deadline-secs <n> waits for an in-flight plain braid lock to finish through the stop coordinator, or waits for the pool lock up to the configured deadline.
  3. Lock dispatch loads membership from pool.json; if pool.json is absent or corrupt, it warns and proceeds with empty membership because mapper cleanup still requires per-candidate LUKS UUID verification.
  4. CLI unmounts and closes LUKS. If sysfs reports a running btrfs balance, --systemd-stop first runs btrfs balance pause so the kernel persists the paused balance before LUKS close; if sysfs reports an already-paused balance, teardown proceeds directly to unmount. Next-boot braid recover fails closed on that persisted paused balance and preserves pending-op.json for manual inspection instead of resuming it. Plain braid lock still refuses all active exclusive operations. The systemd-stop path also retries transient umount EBUSY longer than user lock so a surviving btrfs balance process can release its mount fd during shutdown.
  5. Drives are safe to power off.

Pool lock mutual exclusion

Pool mutators, alert-state mutators, key enrollment, lock, and discover --write (unlock, add, recover, remove, remove-missing, replace, enroll, lock, discover --write, ack, monitor) acquire an exclusive flock on /run/braid-pool.lock in Rust dispatch before reading pool state. unlock, add, recover, remove, remove-missing, replace, enroll, lock, and discover --write are non-blocking fail-fast commands: if the lock is already held by another braid process, the CLI exits 1 immediately with braid: another braid operation is already in progress and the user must retry once the active operation completes. Bare discover is read-only and does not acquire the lock. ack waits up to 10 seconds before returning a retry message. monitor exits 0 silently on contention so a skipped timer cycle does not start alert notification. The lock is held through post-processing (permissions, braid-online activation/deactivation). Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on current locked state. See Principle 12.

Lock acquisition site

For non-dry-run pool mutators, alert-state mutators, key enrollment, lock, and discover --write, the operation lock is acquired in cli/src/main.rs dispatch before config load, pool.json load, journal read, identity probes, subprocess health probes, or interactive prompts. The shell wrapper must not acquire /run/braid-pool.lock; it execs the Rust binary and leaves critical-section ownership to dispatch.

A command started during another mutator could otherwise read stale state, then acquire the lock after the first command finishes and act on old inputs. Late acquisition also regresses the fail-fast UX – users see prompts and probes complete before being told the operation is contended.

The pool lock is the first real execution boundary. Do not model it after the sleep inhibitor’s late-acquisition pattern: the inhibitor protects against suspend mid-operation and can wait until the irreversible window; the pool lock protects against state-staleness and must precede any read of pool state.

ExecStop bounded-wait pattern

When a unit’s ExecStop= invokes a CLI that needs a contended resource (e.g. braid-online.service ExecStop=braid lock colliding with an in-flight mutator that holds the pool lock), the ExecStop path gets a distinct bounded-wait variant – not a fail-fast call. “ExecStop fails fast; in-flight work finishes and a later stop attempt succeeds” is not a valid design: during shutdown there is no later stop attempt. systemctl poweroff can leave the resource (mounted btrfs / open LUKS) in an inconsistent state, and the “in-flight mutator finishes before TimeoutStopSec” claim is not guaranteed.

Current pattern: braid-online.service runs braid lock --systemd-stop --deadline-secs ${braid.lockSystemdStopDeadlineSecs}. The module default is 270 seconds and an assertion requires it to be strictly less than braid-online.service TimeoutStopSec (300 seconds). That deadline bounds only stop-coordinator and pool-lock acquisition; once lock cleanup reaches btrfs balance pause or umount, any kernel wait to quiesce btrfs has no userspace timeout and is bounded only by the unit’s TimeoutStopSec (300 seconds). The systemd-stop path also has a longer transient-busy umount retry (60 attempts at 500ms) because btrfs-progs holds the mount fd while blocked in BTRFS_IOC_BALANCE_V2 and can survive the Rust parent briefly during shutdown. Regular braid lock stays fail-fast for user invocations; the bounded-wait path is documented and tested as a distinct mode.

systemctl start/stop inside held-resource windows

systemctl start <unit> on an already-active oneshot+RemainAfterExit unit is a no-op at the work level, but it still queues a job. If a stop job for the same unit is already in flight (because someone else invoked systemctl stop), the start queues behind the stop. If that stop’s ExecStop= is itself blocked on a resource the caller holds, the result is a deadlock.

This is load-bearing for any CLI that both holds a resource and uses systemctl start/stop on a unit whose ExecStart=/ExecStop= touches that resource (e.g. Rust dispatch holding pool.lock while activating braid-online.service whose ExecStop calls braid lock).

These rules govern start/stop of braid-online.service itself. The systemctl stop calls in run_lock_pre_steps target bound consumers and scrub units, not the lifecycle owner, so they queue no job against braid-online.service and the start-behind-stop deadlock above does not apply to them.

Rules:

  1. Snapshot full unit state at the start of the held-resource window with systemctl show -P ActiveState <unit>. Do NOT use systemctl is-active – it returns “active” only for active, classifying activating and deactivating as not-active. A deactivating unit (its ExecStop is already running and waiting on the held resource) snapshotted as “not active” leads the caller to issue a start that queues behind the in-flight stop – the exact deadlock the snapshot was supposed to prevent.
  2. Only emit systemctl start <unit> at the end of the window if the snapshot was inactive or failed. Skip when active, activating, or deactivating. See ADR 026 snapshot rule.
  3. Only emit systemctl stop <unit> at the end of the window if the snapshot was active or activating. Skip when inactive, failed, or deactivating.
    • Exception: plain braid lock’s post-success mark_offline runs a synchronous systemctl stop braid-online.service without a stop-side snapshot. It is safe because /run/braid-stop-coordinator.lock plus the done\n protocol guarantees the recursive ExecStop reentry exits 0 once plain braid lock has finished cmd_lock, instead of queuing behind the in-flight stop. This coordinator is the mechanism that replaces the stop-side snapshot gate for mark_offline; see ADR 026 stop coordinator. mark_offline skips the synchronous stop when the post-cleanup mountpoint -q check itself fails (e.g. OnlineError::Spawn mid-shutdown): the unit stays active and the operator retries. Treating unknown mount state as still-mounted mirrors mark_online’s start-side fail-safe.

Consumer dependency contracts

Services that depend on the pool being mounted use one of three patterns:

Frequent periodic services (monitor): ConditionPathIsMountPoint only. Neither After nor BindsTo on mnt-storage.mount – those directives force systemd to load the unit, which doesn’t exist until the CLI mounts the pool at runtime (auto-generated from /proc/mounts). The condition gate silently skips the service when unmounted. Fires every 5 minutes – missed fires are cheap, so lifecycle binding is unnecessary.

Infrequent periodic services (scrub): The timer, scrub service, and resume trigger use BindsTo + After on braid-online.service; the timer and trigger are wantedBy the online unit. The timer’s active lifecycle matches the pool’s online period. Persistent=true handles catch-up for overdue fires. Unlike the monitor timer (which fires every 5 minutes and can afford missed runs), the monthly scrub timer cannot wait until next month if it misses – lifecycle binding ensures it fires on the next unlock. The scrub service and resume trigger also get ConditionPathIsMountPoint as defense-in-depth. For manual lock, Rust dispatch stops the timer, resume trigger, and scrub service before unmount (see above).

Long-running services holding open files (samba, nfs): Use the full WantedBy=braid-online.service + BindsTo=braid-online.service + After=braid-online.service triad (same shape as the scrub timer above), plus ConditionPathIsMountPoint=<pool mount>. BindsTo + After ensures systemd stops them before braid lock runs ExecStop, preventing unmount failures from busy filesystems; WantedBy ensures they restart automatically when braid unlock reactivates braid-online.service. The triad handles the unlock-start and lock-stop lifecycle, but these consumers carry their own boot or direct-start edges – NixOS wants samba-smbd.service from samba.target and nfs-server.service from multi-user.target. For starts not initiated by braid-online.service, ConditionPathIsMountPoint is the load-bearing gate that prevents serving an offline mount directory. Rust dispatch iterates BoundBy braid-online.service and stops these consumers before unmount, mirroring the cascade systemd performs on shutdown for user-initiated lock. See ../../guides/sharing-and-permissions.md#binding-shares-to-the-pool-lifecycle for the user-facing example.

Key design constraints

  1. No hard boot dependencies. wants everywhere, never requires. Pool failure never blocks boot.
  2. Rust-synchronized lifecycle. For dispatch-managed operations, Rust keeps braid-online synchronized with pool mount state: it activates the service only after mountpoint -q succeeds, and deactivates it after a successful lock. ConditionPathIsMountPoint on the unit is defense-in-depth against direct systemctl start when unmounted. Out-of-band mount or unmount bypasses dispatch and can leave braid-online stale; braid lock handles already-unmounted pools gracefully.
  3. One passphrase prompt. braid-unlock.service is the sole interactive prompt source. The CLI opens all LUKS devices from that single passphrase.
  4. Graceful degradation. If braid-online activation fails, the pool is still mounted and usable – only the shutdown hook is missing (warned to stderr).
  5. One pool operation at a time. Enforced by a non-blocking flock in Rust dispatch, not wrapper logic or unit topology – concurrent attempts are rejected, not queued. See Principle 12.

See

  • modules/braid/storage.nix — unit definitions
  • modules/braid/monitor.nix — monitor/alert units
  • modules/braid/braid-wrapper.sh — pure exec shim
  • 026-pool-lock-rust-owned.md — Rust-owned pool lock and lifecycle synchronization
  • 003-resilient-boot.md — why no hard dependencies
  • 017-runtime-disk-membership.md — lifecycle model context
  • tests/module/systemd-lifecycle.py — state machine test suite

Inhibit Sleep During Non-Interruptible Operations

Principles:

Context

braid enables whole-system suspend via autosuspend. That is the right default for a quiet, low-power NAS, but it creates a failure mode for long-running storage operations that should not be interrupted mid-flight.

btrfs replace is the motivating example. Upstream btrfs explicitly warns that suspend/hibernate can interrupt device replace and recommends inhibiting sleep before running it. On newer kernels, suspend can cancel the replace outright; on older kernels, suspend can leave braid to recover a broken topology after wake. The same risk profile applies to btrfs device remove (long-running data migration) and to the conditional balances in add and remove-missing (pool_balance_raid1 after add to ≥2 disks; maybe_restore_raid1 after clearing the last missing device).

braid needs a clear rule for when to hold a sleep inhibitor, because “just acquire it for the whole command” is too broad:

  • It is unnecessary and user-hostile to block suspend while waiting for confirmation or passphrase entry.
  • It is correct to block suspend once the command is entering the non-interruptible mutation window where interruption risks corruption, degraded topology, or restarting hours of work.

systemd guidance

braid follows systemd’s inhibitor model directly:

  • systemd-inhibit is for work that should not be interrupted, such as recording media or similarly sensitive long-running operations.
  • block inhibitors are for cases where sleep must be refused outright while the critical section is active.
  • delay inhibitors are for short grace periods where a service needs time to prepare for sleep, not for hours-long work.
  • Inhibitors should be held only for the shortest window that actually needs protection.

Primary references:

Decision

braid acquires a What=sleep, Mode=block inhibitor only for the non-interruptible portion of a long-running operation.

The inhibitor boundary is:

  1. Run interactive prompts, passphrase collection, and reversible validation first.
  2. Acquire the sleep inhibitor immediately before the irreversible mutation window begins.
  3. Keep it held for the full duration of the non-interruptible work, including any required follow-up work that is part of the same intent command.
  4. Release it immediately when that critical section ends, whether by success, error, or signal-driven unwind.

braid must not hold a sleep inhibitor during:

  • confirmation prompts
  • passphrase entry
  • dry-run output
  • reversible preflight that can fail without leaving partial state

Current application

braid replace, braid remove, braid remove-missing, and braid add all hold a What=sleep, Mode=block, Who=braid logind inhibitor for their respective mutation windows. Each command acquires the inhibitor immediately before journal::write_journal(), after all interactive/reversible work, and holds it until the function returns (success, error, or signal-driven unwind).

For all four commands, the protected scope is the post-journal critical section, and the excluded scope is the same:

  • --dry-run
  • confirmation prompt
  • passphrase reads
  • reversible validation and identity checks

Failure to acquire the inhibitor returns a Validation-shaped error before the journal is written, so an environmental logind failure does not strand the user in recovery mode.

braid replace

The protected scope includes:

  • journal write and post-commit phase rewrite
  • new-disk LUKS initialization/open
  • btrfs replace start
  • best-effort old-mapper close for live replacements
  • post-replace resize
  • post-replace soft RAID1 balance for missing-path replacements that clear the last missing device

The new-target LUKS identity check is deliberately two-tier: the primary gate (cli/src/replace.rs#verify_existing_luks_new_target_preflight) runs pre-journal under the excluded “reversible validation and identity checks” rule above, so an operator disk-swap or backing-drift in the post-confirmation window aborts on the reversible side without stranding pending-op.json; a residual re-probe (probe_existing_luks_new_target_uuid closed-mapper arm, verify_existing_luks_open_mapper_target open-mapper arm) stays post-journal inside the “new-disk LUKS initialization/open” scope to guard the narrow journal->open window that contains the optional slot-1 keyfile enroll. Do not collapse it to one tier.

braid remove

The protected scope includes:

  • journal write
  • the optional pre-remove pool_balance_single (RAID1→single) when only one device will remain
  • btrfs device remove data migration
  • post-remove LUKS mapper close and membership persistence

braid remove-missing

The protected scope includes:

  • journal write
  • btrfs device remove <devid> (chunk relocation via btrfs_shrink_device; can run for minutes when the missing device had data allocated because surviving RAID1 stripes are rewritten into newly allocated chunks on remaining devices)
  • post-op membership persistence
  • post-commit phase rewrite
  • the conditional soft RAID1 balance that converts single-profile chunks (created during degraded operation) back to RAID1 when clearing the last missing device on a multi-disk pool

The inhibitor is acquired unconditionally before journal write, even in the cases where maybe_restore_raid1 will be a no-op. This keeps the boundary rule simple (“acquire before journal”) and matches the rest of the suite. The “savings” of skipping acquisition when the soft balance will not run are tiny on a NAS that is idle most of the time.

braid add

The protected scope includes:

  • journal write
  • LUKS format/header backup/open of fresh disks
  • pool_bootstrap_mount / pool_bootstrap_mount_raid1 (bootstrap path) or pool_add_device followed by the conditional pool_balance_raid1 (add-to-existing-pool path) when the post-add pool has ≥2 devices
  • post-op membership persistence

As with remove-missing, the inhibitor is acquired unconditionally before journal write. The bootstrap path’s mkfs phase is fast but still irreversible across the journal boundary; the add-to-existing path’s RAID1 balance is the long-running phase that the inhibitor primarily protects.

The no-op early-return path (all requested disks already in the pool) returns before the inhibitor seam fires — no journal is written, so no protection is required.

braid recover follows the same boundary for replayed destructive work. In particular, add PoolMutation recovery resolves and verifies the needed passphrase before acquiring a sleep inhibitor; the inhibitor is acquired only after reversible credential checks pass and immediately before replaying target preparation or btrfs membership work.

Excluded: braid lock

braid lock deliberately does not acquire the sleep inhibitor, even though its mutation window (umount + per-mapper cryptsetup close) is non-trivial in wall-clock time. This is the worked example of the deciding question below applied to lock work specifically:

  • Recoverability. A lock interrupted mid-flight leaves a state that re-running braid lock advances on, to the extent its existing probes can detect. Specifically:

    • plan_lock’s mountpoint -q skips the umount step when the pool is already unmounted (cli/src/lock.rs’s plan_lock).
    • The per-mapper close path checks fs.exists("/dev/mapper/<name>") before issuing cryptsetup close and reports “already closed” otherwise, so closed membership mappers do not re-error on a follow-up run.
    • Orphan mappers (braid-* paths not in pool.json) are re-scanned on each invocation and closed; close failures still surface as fatal errors, and a /dev/mapper scan failure is warned and yields an empty orphan list for that run – not silently swallowed.

    Unlike replace/add/remove/remove-missing, there is no kernel-level topology corruption window and no hours-long restart cost. The point is that a partially-completed lock does not poison subsequent invocations – not that every failure is hidden.

  • Shutdown-driven ExecStop. When braid lock runs as braid-online.service’s ExecStop= during system shutdown, the system is heading to shutdown.target/power-off, not to suspend. A sleep inhibitor acquired during that window is redundant – logind does not schedule a suspend transition mid-shutdown.

  • Manual stop and user-lock reentry. ExecStop=braid lock also fires on a manual systemctl stop braid-online.service and on the Rust dispatch post-lock mark_offline (cli/src/online_state.rs) for user-initiated braid lock, gated on systemd_lifecycle (see docs/design/decisions/018-systemd-lifecycle.md:131 and modules/braid/storage.nix’s braid-online definition). Those paths do not enjoy the shutdown-driven guarantee above; their justification is the recoverability + short-duration argument, not the shutdown-target one.

  • Suspend context. braid-online.service has no Conflicts = sleep.target (see modules/braid/storage.nix). By the 016-auto-suspend.md design the pool stays mounted across suspend, so the only realistic mid-lock-suspend race is a user-initiated braid lock colliding with autosuspend’s idle countdown. That window is narrow (lock is short) and the failure mode is recoverable, per the first bullet.

  • ExecStop budget. braid-online.service runs lock under TimeoutStopSec = 5min. Adding subprocess work to that path (a systemd-inhibit fork plus its supervised sh + sleep child) buys no protection commensurate with the added shutdown-path complexity.

If a future change makes lock’s mutation window genuinely long (e.g. a multi-minute pre-lock balance), revisit this exclusion under the same deciding question.

Excluded: braid enroll

braid enroll does not acquire the sleep inhibitor despite mutating LUKS slot 1 on each pool disk. Applying the deciding question to standalone enroll specifically:

  • No journal, no recovery-mode lockout. Standalone enroll writes no operation journal (EnrollPlan::execute in cli/src/enroll_key_file.rs). Suspend mid-loop cannot strand the operator in recovery mode, which is the failure surface this doc’s “Validation-shaped error before journal write” promise protects against for the four inhibitor-using commands.
  • Recoverability. plan_enrollment probes each candidate via probe_keyfile_enrollment and short-circuits disks whose slot 1 already verifies the keyfile (AlreadyEnrolled). A partial enroll leaves only the un-enrolled disks for the next invocation: re-running braid enroll DIR (existing-keyfile mode) advances on partial state, the same property that justifies lock’s exclusion. Note that braid enroll --generate is not same-command idempotent – a partial --generate run leaves DIR/braid.key on disk, and validate_key_file_path refuses a second --generate against an already-present keyfile. Recovery for an interrupted --generate run is to drop --generate and re-run as a regular enroll against the now-existing keyfile.
  • Bounded mutation window. Each disk pays one Argon2-bounded cryptsetup luksAddKey (about 2-3 sec on default parameters) plus a sub-second cryptsetup luksHeaderBackup. A three-disk pool’s total enroll window is single-digit seconds with no long-running btrfs work to protect.
  • No btrfs topology mutation; LUKS2 writes use cryptsetup metadata locking. Enroll does not touch btrfs membership or chunk allocation, which is the topology-corruption risk surface this doc was written to protect. LUKS2 metadata writes are serialized by cryptsetup’s own metadata locking. After each successful cryptsetup luksAddKey, apply_enrollment writes a local .luksheader as input to the existing off-system backup workflow (see docs/internals/luks-unlock.md); the local file is a transient byproduct of a successful mutation, not a recovery mechanism for an interrupted one. Recovery from actual header damage uses the operator’s off-system backup, identical to every other LUKS-mutating command in braid.

The same luks::enroll_key_file call is held under an inhibitor when invoked from braid add --enroll or braid replace --enroll, but that is incidental: those commands already hold an inhibitor for their journal-protected btrfs work, and the keyfile call happens inside that existing window. Standalone braid enroll has no btrfs work to protect and no journal boundary to guard, so an inhibitor would buy nothing.

If a future change adds long-running follow-up work to braid enroll (e.g. a pool-wide rekey or a balance after enrollment), revisit this exclusion under the same deciding question.

Consequences

  • suspend is blocked only when interruption is actually dangerous
  • operators are not prevented from suspending the host while braid is still waiting on human input
  • add, remove, remove-missing, and replace all follow the same boundary rule; future long-running commands should reuse it instead of inventing command-specific behavior
  • failure to acquire the inhibitor (e.g. logind unreachable) is a clean validation error before the journal is written, never a recovery-mode lockout

The same default does not automatically apply to every long-running task; the deciding question is whether suspend would make the operation incorrect, unsafe, or expensive to restart.

UPS Integration

Principles:

Context

A btrfs RAID1 pool tolerates clean shutdowns, but sudden power loss during active I/O – especially during a long-running btrfs replace, btrfs device remove, or post-add/remove balance – can leave the pool in a state that requires manual recovery. This is the same risk surface that decision 019 protects against for suspend/wake, but it cannot use the same control model. A sleep inhibitor actively blocks the operating system from suspending; braid cannot analogously block a UPS from running out of battery. The control model here is different: reject avoidable starts on battery up front, and prove journal recovery for the unavoidable mid-mutation case.

A UPS solves this only if the host cooperates. NUT (Network UPS Tools) is the standard Linux interface, and nixpkgs already provides a mature power.ups module that configures NUT declaratively – units, users, udev rules, killpower handling. braid’s job is not to reimplement that, but to layer opinionated policy on top so that enabling UPS support gives a home NAS three specific guarantees:

  1. Orderly shutdown before battery exhaustion for ordinary mounted operation.
  2. Preflight refusal to start pool-mutating commands unless the UPS reports verified utility power (OL).
  3. Live UPS state visible in braid ups status and the TUI; live UPS status is used for preflight safety and upsmon critical-state shutdown (normally OB + LB together, per reference/nut/clients/upsmon.c:1404).

The guarantees do not extend to “safe against any power loss.” A UPS firing LB during a mutation that started on AC still interrupts that mutation. Recovery for that case falls to the existing journal + braid recover path, and must be proven per mutation class by VM tests before this decision flips to Active.

“Just alert the user on low battery” is insufficient for guarantee (1): a prolonged outage with nobody present would still exhaust the battery during an active mount. The host must power off before battery exhaustion, because decision 018’s teardown sequence (braid-online.service ExecStop -> btrfs umount -> luks close) needs a non-trivial window of live power to complete cleanly.

Decision

Scope: standalone, USB, single-host

v1 supports one NUT-compatible UPS connected over USB to the NAS, monitored by the NAS itself. Not supported as first-class:

  • networked NUT (primary/secondary across multiple machines)
  • serial apcsmart, snmp-ups, or other non-USB drivers
  • multiple UPSes per host

An escape hatch (driver = "...", port = "...") exists for users whose UPS speaks a non-USB protocol, but braid does not guarantee correct behavior outside the USB path.

Rationale: USB UPSes cover the vast majority of home NAS deployments. Every non-USB topology adds configuration surface (network auth, SNMP community strings, serial port permissions) that braid would have to validate and test. Single-host standalone avoids the two-machine primary/secondary dance and its timing/credential complexity.

Wrap power.ups, do not reimplement NUT

The braid module sets power.ups.* values from its higher-level options. It does not write ups.conf, upsd.conf, or upsmon.conf directly, and does not define its own nut-* systemd units.

This is a deliberate departure from the pattern in modules/braid/fan-control.nix, which owns its unit because nixpkgs’ hddfancontrol module has concrete lifecycle bugs. The nixpkgs power.ups module has no equivalent known defect; reimplementing its surface would duplicate work and diverge over time.

Data source: shell out to upsc

The TUI and braid ups status command read UPS state by invoking upsc <name> and parsing its key/value output. This matches every other braid parser (btrfs, cryptsetup, lsblk, smartctl, smartd, hddfancontrol).

A parse_upsc module in cli/src/parse/ handles the parse, with stable and unstable golden fixtures in cli/tests/fixtures/. NUT (networkupstools) joins btrfs-progs, cryptsetup, and util-linux in the parser-critical toolchain (see decision 010 and parser compatibility), with fixture refresh required on any nixpkgs bump that changes its pinned version.

Pinning is load-bearing. A new braid.packages.networkupstools option is added alongside the existing btrfsProgs, cryptsetup, and utilLinux pins, defaulted to nixos-26.05’s networkupstools. The module uses this pin to configure the NUT package the power.ups service resolves (exact nixpkgs option name to confirm during implementation) and includes the same derivation in the CLI wrapper’s PATH so that upsc invoked from braid ups status resolves to the tested version rather than whatever the host’s system path provides. Decision 010 and principle 10 are updated in the same implementation to name NUT as parser-critical.

ups.status is parsed into an ordered, deduplicated list of flags (OL, OB, LB, CHRG, DISCHRG, RB, …), not an enum. Flags are stored in upsc emission order; membership and dedup give set semantics without imposing a sort. Display severity is derived from the combination; unknown tokens are preserved in the parsed model so that new NUT statuses do not silently disappear.

braid ups status defaults to a curated human-readable summary and supports --json for the typed parsed model. Raw upsc passthrough is not exposed; users who want that can still run upsc directly.

The --json success shape preserves the typed parsed model at top level. If upsc exits 0 but ups.status is empty or missing, the JSON output stays exit 0 and adds top-level "warning": "ups_status_empty" beside the parsed body. Scripts must treat either .error or .warning as a sentinel that the body is not trusted healthy UPS state. The status_flags array preserves first-seen ups.status token order across the human, --json, and TUI surfaces – braid imposes no sort of its own. Order is deterministic for a given UPS state (whitespace is normalized and repeated tokens collapse to first-seen); it is not a byte copy of the raw ups.status: line.

Shutdown-on-LB = systemctl poweroff

When NUT fires the low-battery (LB) event, upsmon runs systemctl poweroff. systemd’s standard shutdown sequence then unwinds braid-online.service (decision 018), which closes the btrfs mount and LUKS mappers. The host powers off via normal means before the UPS exhausts its battery.

This is not “alert only.” The host genuinely shuts down, because the only safe state during a prolonged power outage is off. An alert-only policy would require the user to react in time, which defeats the point of unattended operation.

Reject pool-mutating commands unless UPS reports utility power (preflight hygiene only)

When braid.ups.enable = true, braid add, braid remove, braid remove-missing, and braid replace query UPS status at preflight and refuse with a Validation-shaped error unless the UPS status can be trusted as explicitly on utility power. The check is fail-closed: it refuses on upsc invocation or query failure (dead upsd, unknown UPS name, or exec failure), an empty or missing ups.status, any critical flag (LB, TESTFAIL, COMMBAD, FSD – the same set the TUI paints red), on-battery (OB), or any status set missing OL. Known non-critical advisory states such as OL RB, and unknown tokens co-present with OL and no known blocker, still pass because utility power is explicitly present. The check sits alongside the existing preflight checks, before any journal write.

This is preflight hygiene, not a mutation-window guarantee. It narrows the surface that journal recovery must cover by rejecting the easy case – “user starts braid replace while the power is already out” – but it cannot and does not prevent LB from firing mid-mutation on work that started on AC. Mid-mutation power loss is handled by the existing journal + braid recover path; see the recovery-proof obligation in Open Questions.

This is not the power-side equivalent of decision 019’s sleep inhibitor. A sleep inhibitor actively blocks suspend for the duration of the mutation window; braid cannot analogously block UPS-driven shutdown, because the UPS is dying and no amount of inhibiting changes that. Instead, the contract is: reject the avoidable case up front, and rely on recovery for the unavoidable case.

Alert-model integration is deferred

Integrating UPS events into the shared AlertState / AlertCause model (decision 014) is deferred to a future ADR. Decision 014 guarantees “alerts stay latched until braid ack” – the right shape for event-driven causes (disk errors, smartd), but wrong for live-state conditions like OB / LB (users expect those to clear when the UPS returns to OL). Reconciling that requires splitting AlertCause by persistence semantics (LatchedUntilAck vs. ActiveWhileConditionHolds) and updating merge_into_latch, ack, status, and the alert-test matrix. That is a core-invariant change that deserves its own ADR; smuggling it into UPS v1 would conflate two distinct concerns.

v1 therefore surfaces UPS state only through braid ups status and the TUI. Operators who are not actively watching those surfaces will not see on-battery or comms-loss conditions asynchronously in v1. This is a known gap until the follow-up ADR lands.

braid-online becomes safety-critical under UPS

mark_online (cli/src/online_state.rs) warns and exits successfully when systemctl start braid-online.service fails after a successful unlock/add/recover. When UPS support is enabled, this silent-degradation path is unsafe: the user believes LB will trigger a clean shutdown, but without braid-online.service active, its ExecStop does not run and LUKS close is not guaranteed to complete before power dies.

braid doctor and the TUI flag “pool mounted but braid-online inactive” as a high-severity configuration fault whenever UPS support is enabled. mark_online’s warn-and-continue behavior otherwise remains unchanged; the UPS path adds a new detector, it does not change the underlying unlock sequence. Under systemd_lifecycle = false (CLI-only), the lifecycle path is skipped entirely; the UPS-safety detector fires only when systemd_lifecycle = true and UPS support is enabled.

Upsmon credential lifecycle

NUT requires upsmon to authenticate to upsd even in single-host standalone mode. The credential lives at /var/lib/braid/upsmon.pass with mode 0600, owned by root, outside the Nix store.

Generation: a oneshot braid-ups-secrets.service creates the file if absent with a random token (e.g. head -c 24 /dev/urandom | base64) and exits. The oneshot is wired with before = [ "upsd.service" "upsmon.service" ] and requiredBy = [ "upsd.service" "upsmon.service" ] (the actual nixpkgs power.ups unit names), so upsd and upsmon hard-fail to start if secret creation fails rather than racing it. systemd.tmpfiles rules ensure /var/lib/braid/ exists with correct ownership before the oneshot runs. The file is stable across rebuilds; regeneration happens only on explicit deletion. No rotation is performed because the scope is loopback upsmon<->upsd on a single host.

Reference: the rendered NUT configs consume the file via power.ups.users.<name>.passwordFile and power.ups.upsmon.monitor.<name>.passwordFile (not inline passwords), so the token never enters the Nix store or nix-store --query output.

Proposed config surface

braid = {
  enable = true;

  ups = {
    enable = true;
    name = "ups";               # identifier used by upsd and upsc
    driver = "usbhid-ups";      # USB default; covers the vast majority of UPSes
    port = "auto";              # usbhid-ups's standard "find the device" value
  };
};

Defaults applied internally, not surfaced as options in v1:

  • standalone mode (upsd + upsmon on the same host, no network monitors)
  • SHUTDOWNCMD = systemctl poweroff
  • upsmon credentials per “Upsmon credential lifecycle”

Note: NOTIFYCMD is intentionally not configured in v1 – alert-model integration is deferred (see “Alert-model integration is deferred” above).

The configured name is also written to /etc/braid/config.json so that braid ups status and the TUI do not have to guess which UPS to query.

Deferred

  • networked NUT (primary/secondary across hosts)
  • non-USB drivers as first-class support (work via escape hatch, not tested)
  • pre-shutdown grace window with braid ups abort-shutdown
  • battery-age reminders driven by battery.mfr.date + the RB status flag
  • multi-UPS per host
  • UPS-triggered automatic pause of running balance (scrub is cancelled on shutdown and resumed on next pool activation; crash-paused owed RAID1 balance now fails closed in recover while idle/no-paused owed replay still runs)

Resolved questions

Each of these blocked the flip from Draft to Active. All three are now closed by VM tests committed in tests/module/.

  1. Recovery-proof for mid-mutation power loss (primary blocker). Resolved by the four VM tests in plans/impl/2026-04-21-forced-shutdown-recovery-proof.md’s matrix: ups-lb-during-replace, ups-lb-during-remove, ups-lb-during-remove-missing, and ups-lb-during-balanced-add. Each fires OB LB via upsrw while a different mutation class is in flight, lets systemctl poweroff run, reboots the VM, and runs braid recover. The idle/no-paused recovery path still asserts the post-recover state matches what the original mutation would have produced – including no orphaned LUKS mappers, no MISSING btrfs entries, no remaining single-profile chunks where RAID1 was intended, and a cleared pending-op.json. The crash-paused owed RAID1 subcase is intentionally narrower: ups-lb-during-remove-missing and the paused branch of ups-lb-during-balanced-add now assert that recover preserves pending-op.json, leaves single-profile chunks visible, and asks for manual btrfs inspection instead of replaying a balance. The Pre-M11 audit also surfaced two cli/src/recover.rs gaps that the same plan landed before the matrix ran: pool_resize_device is now replayed for OpKind::Replace, and a soft RAID1 balance is replayed for OpKind::Add, OpKind::RemoveMissing, and OpKind::Replace only when btrfs balance status is idle; see balance-soft for the underflow rationale behind the fail-closed branch.
  2. Shutdown ordering for ordinary mounted operation. Resolved by tests/module/ups-lb-clean-shutdown.{nix,py} (Plan 1’s M7). The VM test mounts an idle pool, fires OB LB via upsrw, and asserts braid-online.service’s ExecStop completes (and is not killed by TimeoutStopSec) before poweroff. The default TimeoutStopSec = 5min is sufficient for a single-disk pool; larger pools should retain that headroom.
  3. Battery-low threshold. Resolved with the upstream NUT default. Plan 1’s M7 (ups-lb-clean-shutdown) passed without raising battery.runtime.low from its driver-dependent default (often 120s). That test deliberately imports tests/module/lib/ups-fixture.nix with upsmonTimings = null so upsmon runs at upstream POLLFREQ/POLLFREQALERT/FINALDELAY = 5/5/5 – the runtime-budget claim is therefore backed by representative timings, not the squeezed 1/1/0 cadence the Plan 3 matrix tests use to keep the LB-detection window narrower than an in-flight mutation. Larger real-world pools that risk exceeding the default budget can override power.ups.upsmon.settings (or the driver’s battery.runtime.low) at the deployment level; braid does not need a dedicated option for v1.

Consequences

  • enabling UPS support is one line of Nix, plus two optional strings for non-default drivers
  • for ordinary mounted operation, the host powers off cleanly on low battery without user intervention
  • pool-mutating commands refuse to start unless utility power (OL) is verified, narrowing the journal-recovery surface to the mid-mutation case
  • mid-mutation power loss is a supported recovery case, not a guarantee: braid recover is load-bearing for replace / remove / remove-missing / balanced add interrupted by LB-driven shutdown, and VM tests prove both the idle/no-paused success path and the crash-paused owed RAID1 fail-closed path
  • live UPS state is visible in braid ups status and the TUI; users not actively watching those surfaces do not get asynchronous notifications in v1 (alert-model integration deferred to a future ADR)
  • NUT joins btrfs-progs, cryptsetup, and util-linux as a pinned parser-critical tool; nixpkgs bumps touching networkupstools trigger the same fixture-refresh obligation as the other three
  • the existing braid-online.service lifecycle (decision 018) is load-bearing under UPS; its failure mode is no longer acceptable silent degradation and braid doctor reflects that

Superseded by Principle 13.

Wait rows in unlock and shared mount helpers

Principle: 13. Announce long-running work

Context

The single-passphrase invariant (Principle 4) requires braid unlock to verify the supplied credential against every reachable LUKS member before opening any mapper. On a 3-disk pool that is three sequential Argon2 derivations – visible to the user as three back-to-back [wait] passphrase: checking against ... rows.

Two later phases of the same command stayed silent:

  1. Per-disk cryptsetup luksOpen. cryptsetup re-derives Argon2 inside luksOpen even after --test-passphrase already verified. The user saw three [ok] disk X: unlocked rows arrive one by one with no leading announcement.
  2. Mount phase. scan_and_mount runs btrfs device scan + mkdir + mount in sequence and emits a single [ok] pool: mounted ... row at the end. None of the three steps are announced individually.

The result: between the last verify row and the mount row, the user stared at an inactive terminal for several seconds with no signal that anything was happening.

Options considered

  1. TTY spinner / progress bar. Rejected: requires a TTY, fights log capture, and looks broken inside braid-auto-unlock.service journals.
  2. Best-effort ad-hoc waits. Add [wait] rows whenever a gap is noticed. Rejected: gaps recur whenever a new slow path is added.
  3. Codify “[wait] before every long-running step” as a project principle now. Rejected: principles are authoritative (docs/design/principles.md); a principle the codebase doesn’t satisfy on the day it lands is a documentation bug. add, replace, remove, remove-missing, recover’s own (non-shared) slow paths, enroll, and lock keep silent gaps today.
  4. Scope the rule to braid unlock and the shared mount helpers today; promote to a principle once the other commands comply. Accepted.

Decision

braid unlock – and braid recover’s mount tail, which routes through the same shared helpers – emit a [wait] row before every long-running step:

  • per-disk cryptsetup luksOpen (passphrase and keyfile arms), worded [wait] disk {name}: unlocking...;
  • the mount phase (btrfs device scan + mkdir + mount), worded [wait] pool: mounting {mount_point}.... The single row covers all three steps because emitting one row per step would be noisy without buying the user any actionable information.

Each [wait] row uses status_tag::status_line(StatusTag::Wait, ...) and is closed by the existing per-step success row.

Other interactive commands keep their current behavior until they are individually updated.

Tradeoffs accepted

  • Slightly more verbose stderr.
  • Enforcement for the in-scope helpers is by VM-test assertion in tests/cli/braid-unlock.py, tests/cli/braid-unlock-key-file.py, and tests/cli/braid-recover.py. Project-wide enforcement is deferred until promotion to a principle.
  • braid recover inherits the new rows automatically because execute_mount_only and execute_unlock_and_mount are shared. This is desirable: recover’s mount tail is exactly the same long-running work as unlock’s.

Promotion outcome

Promoted to Principle 13 once add, replace, remove, remove-missing, recover’s replay tail and self-mount remount cycle, lock, and enroll were brought into compliance.

See

  • cli/src/mount.rsopen_disks_with_credential and scan_and_mount host the new rows.
  • cli/src/status_tag.rs – the canonical StatusTag::Wait and status_line helpers.
  • cli/src/unlock.rs:93-96 – the already-mounted short-circuit that returns before any helper runs (so already-mounted unlocks emit no new rows).

Dry-run preview model

Principles:

Context

Intent commands originally mixed dry-run rendering seams with execution planning. Some commands compiled Vec<Step> directly for preview tests, while execution consumed separate command-specific state. That made it too easy for a dry-run preview to drift from the work a real run would perform, especially around LUKS preparation, btrfs mutations, journals, cleanup, and follow-up maintenance such as resize or balance.

The current model keeps dry-run preview and execution tied to the same typed semantic decision. Step is only the output shape used to show a preview; it is not the plan.

Decision

For migrated mutating commands, dispatch owns the read-side fences that must run under the pool lock before the planner starts: pending-operation preflight and config loading. The pending-operation preflight must run before config load so a recovery journal is never hidden behind a config parse error. The planner then owns pool state loading, live probes, accumulated preview notes, and construction of a typed work plan. This split finishes the Rust-owned pool-lock migration: the lock boundary and the config/journal reads it protects now live above plan_*(), while dry-run and real execution still share the same typed plan. The command wrapper calls the planner first. On --dry-run, it prints plan.preview() to stdout. On a real run, it passes the same plan to execute().

A successful command plan carries:

  • accumulated PreviewNotes, in the order they must render;
  • a typed WorkPlan containing the semantic choices execution needs.

preview() is the public dry-run boundary. It constructs a Preview whose steps come from work_plan.render_steps(). Notes render first, then steps. A plan struct must not cache a rendered Vec<Step> alongside its work plan.

execute() consumes the same typed WorkPlan. It must not rediscover or reinterpret semantic choices already made during planning. It may still perform execution-time validation that dry-run intentionally cannot do, such as checks that require a passphrase or a mapper that was closed during planning.

Step is output-only. It may describe risk, human text, and representative commands for dry-run rendering, but it must not become an execution source, a planning cache, or a second semantic model.

When planning accumulates notes and then fails later, use a report shape that returns both the error and the accumulated notes. The command wrapper renders those notes to stderr before returning the error, using the same preview note renderers that dry-run stdout uses. This preserves context without duplicating wording.

Output contract

The structured dry-run preview lives on stdout. Preview notes are part of that stdout preview. Real-run notes, and notes preserved on a later planning error, render to stderr through the shared preview renderers so warning and info wording stays byte-compatible across modes.

Confirmation UI

Confirmation UI is not a preview note. The interactive !params.yes block – the command summary, yes/no prompt, and go/no-go safety warnings attached to that prompt – is deliberately absent from both --dry-run and --yes output. In cli/src/remove.rs and cli/src/replace.rs, the 1-disk redundancy warning belongs to this class because it gates the operator’s final decision about an explicitly requested action, rather than reporting a discovered precondition.

For remove 2->1, dry-run still surfaces the redundancy-loss consequence as the RAID1 -> single balance step. For replace, the 1-disk warning is confirmation-only context for a pool that is non-redundant before and after; dry-run previews the replacement steps, and no redundancy-changing step exists for that warning.

Long-running side-effect-free probes that run while building a preview may emit [wait] / [ok] / [skip] status rows to stderr per Principle 13. Those rows are not part of the structured preview.

Fresh-format identity placeholder

A fresh LUKS format mints its identity per-invocation at plan time (ADR-024), so the UUID a real run will write does not yet exist when dry-run renders. Showing the minted UUID would make the preview non-reproducible – two dry-runs of the same command would differ – and misleading, since that value is discarded when dry-run returns and a later real run mints a different one.

So the two fresh-format render sites (cli/src/add.rs#AddWorkPlan::render_steps and cli/src/replace.rs#ReplaceWorkPlan::render_steps) emit a preview-only cli/src/cmd.rs#CmdRequest, CryptsetupLuksFormatPreview, whose to_argv renders a fixed --uuid '<generated-at-format-time>' placeholder (single-quoted by shell_words). The real run uses CryptsetupLuksFormat with the journaled identity. Both render through one shared cli/src/cmd.rs#luks_format_argv builder, so a future luksFormat flag appears in both at once – the “representative commands” / “Step is output-only” rules in the Decision section still hold; this is the one place the rendered command intentionally diverges from the real argv. The preview variant is never executed: cli/src/cmd.rs#RealRunner hard-errors on it via cli/src/cmd.rs#CmdRequest::is_preview_only before any spawn.

recover is excluded: cli/src/recover.rs#render_add_pool_mutation_recovery_steps also emits CryptsetupLuksFormat, but its UUID comes from the committed journal – reproducible and meaningful – so recover keeps rendering the real identity.

Scope

The typed work-plan preview model is the precedent for add, replace, remove, remove-missing, and recover.

Recover is the one deliberate exception to the read-side planner rule. When recovering an interrupted existing-pool add and the pool is not already mounted, plan_recover reconciles the validated add-targets – those present, LUKS-openable, and not yet pool members – before mount: it opens any whose mapper is closed (resolving the unlock credential once, and only then), and btrfs-scans a target only when its mapper shows a btrfs signature. All of this is gated by !dry_run (discover_add_targets_before_mount, after an already-mounted short-circuit). The preflight is non-destructive and exists for two reasons: resolving the credential in the preflight window where an interactive prompt belongs, then caching it so execute reuses it without a second prompt (single passphrase, Principle 4); and making an already-committed-but-closed target visible to the kernel before the initial mount so the mount assembles it instead of recover re-adding or re-formatting it. It is not a general license to mutate inside plan_*().

The LUKS-UUID-identity migration also gave lock a typed close set (LockCloseSet carrying ordered LockMapperClose entries in cli/src/lock.rs). Dry-run step compilation (compile_lock_steps), btrfs device scan --forget, and LockPlan::execute all read from that close set so preview and real execution share one identity classification. LockPlan::preview() derives Vec<Step> on demand from the close set rather than caching rendered steps.

Older dry-run seams in unlock and enroll may remain until those commands are intentionally migrated. Do not use their older helpers or cached step fields as precedent for commands already on the typed work-plan model.

Consequences

  • Tests about user-visible dry-run output should prefer plan_*() followed by plan.preview().render().
  • Tests about the step list should use plan.preview().steps.
  • Narrow leaf-renderer tests may call work_plan.render_steps() directly when reaching the case through plan_*() would require noisy unrelated setup.
  • New migrated command plans should store semantic work, not rendered steps.

See

  • cli/src/preview.rsPreview, PreviewNote, and canonical rendering.
  • cli/src/cmd.rsStep and dry-run command rendering.
  • docs/design/decisions/012-intent-cli.md – intent-command safety model and dry-run probe constraints.
  • plans/impl/2026-05-06-unify-cli-plan-execution.md – historical implementation plan for the migration that introduced this typed work-plan preview model.

Secret handling discipline

Related:

Context

braid handles two kinds of LUKS secret material in process memory:

  • user-entered passphrases used for cryptsetup open, verify, format, and keyfile enrollment;
  • generated keyfile bytes used for slot-1 auto-unlock enrollment.

These values must exist briefly in process memory, but they should not escape that narrow window through ordinary Rust strings, buffered readers, command arguments, debug output, or long-lived scopes.

Decision

LUKS passphrase plaintext is represented by secret::Passphrase, a newtype around Zeroizing<String>. LUKS keyfile byte buffers remain Zeroizing<[u8; KEYFILE_SIZE]> because the generated bytes never leave the function frame that writes them.

Every passphrase read path must use unbuffered Read, not BufRead, and must consume input one byte at a time into pre-sized zeroizing storage. This avoids std-internal buffering that can retain plaintext outside braid-owned Zeroizing values. Confirmation reads in cli/src/confirm.rs intentionally accept Read, not BufRead, for the same reason: confirmation must not pre-drain bytes needed by a later --passphrase-stdin read.

Every secret-bearing read must enforce a hard byte cap while reading. Passphrase reads use PASSPHRASE_MAX_BYTES = 64 * 1024; confirmation reads use CONFIRM_MAX_BYTES = 256. New secret-read sites must declare and enforce their own cap instead of allowing unbounded growth of a zeroizing buffer.

Anything inside a Passphrase must reach subprocesses through CommandRunner::run_with_stdin, never through CmdRequest::to_argv. The Passphrase::expose_secret() method is the grep-friendly plaintext egress point for these handoffs. ps(1) must never be able to surface a passphrase.

Generated random secrets must drop before any later syscall whose duration is unbounded. In particular, generated keyfile bytes are scoped so the Zeroizing<[u8; KEYFILE_SIZE]> is dropped before the durability sync_all() on the written file.

Every type that owns secret bytes must implement Debug with redacted output. The canonical rendering is <redacted>.

braid does not use in-process passphrase equality as an authentication mechanism. Normal passphrase verification is delegated to cryptsetup/LUKS. The only current in-process comparison is the local double-prompt confirmation flow for fresh formatting, where braid checks that two user-entered strings match before one becomes the new pool passphrase.

Threat Model

These rules harden braid’s in-process memory image against accidental plaintext retention in process snapshots, core dumps, and swap residue. They do not defend against a privileged attacker on the running host with ptrace, /dev/mem, root access to /proc/<pid>/mem, or equivalent capabilities.

The target invariant is narrower: no plaintext beyond the smallest practical in-process window, and no untyped plaintext values at module boundaries.

Active – Refines 017-runtime-disk-membership.md.

Decision: LUKS UUID Is Disk Identity

Principle: Stable identifiers

Context

Runtime membership originally used the operator disk name as the key in pool.json. The same name also appears in mapper names and LUKS labels, so code could accidentally treat display/runtime handles as identity. That made label drift, mapper drift, and cloned disks hard to reason about: a member could be the same encrypted device while its label or mapper path changed, or two different by-id paths could expose the same cloned LUKS header.

Decision

Use the LUKS UUID as the persistent disk identity. pool.json and pending-op.json membership snapshots are keyed by canonical LUKS UUIDs. DiskMember.name remains the operator-facing name and DiskMember.by_id remains the hardware address used to reach the device. DiskMember.devid is persisted only as prior-binding state for btrfs cases where the live device is observable by devid but not by LUKS UUID, such as null_underlying mappers and missing_devids.

Fresh add and replace operations pre-generate the UUID that cryptsetup must write, store that UUID in the journal before mutation, and pass it through the structured CryptsetupLuksFormat request. User-supplied --luks-format-arg values may not override --uuid or --label.

Identity Boundaries

IdentifierRolePersistent identity?Normal user vocabulary?
LuksUuidEncrypted-volume identity used for membership correlation, journals, duplicate detection, and live probe checks.YesNo
DiskNameOperator-facing name used in commands, status summaries, mapper suffixes, and labels.NoYes
ByIdPathHardware address used to find, open, or format a disk before it is mapped.NoSetup and repair only
DiskMember.devidPrior btrfs binding used when btrfs can report a device by devid but no live LUKS UUID is observable.Fallback binding onlyRepair diagnostics only
braid-<DiskName> mapper nameRuntime handle passed to cryptsetup, btrfs, mount, and close operations.NoMostly hidden
braid-<DiskName> LUKS labelHuman/debug label for LUKS headers and discovery bootstrapping.NoMostly hidden

This means UUID identity does not move normal command vocabulary from names to UUIDs. Operators still add, replace, remove, and read disks by names such as toshiba1. UUIDs belong in pool.json, journals, machine-readable status, and diagnostics where braid must prove that the encrypted member is the expected one.

Benefits

  • Single source of truth. pool.json has one persistent member identity: the LUKS UUID map key. Disk name, by-id path, and btrfs devid no longer duplicate or compete with a value-side luks_uuid field.
  • Drift-tolerant member correlation. Commands resolve membership by UUID instead of reconstructing identity from braid-<name>. A member opened under a drifted mapper can still be recognized as the same disk, and cleanup paths close the observed mapper rather than the expected one.
  • Safer recovery replay. Journals carry UUID-keyed pre-operation and target membership snapshots. Recovery can compare the live pool against the journaled member set by UUID/devid and re-check live UUIDs before replaying format, add, replace, resize, or close steps.
  • Earlier clone and swap detection. Duplicate LUKS UUIDs are rejected before membership writes or destructive operations, and UUID mismatches catch disks that were swapped, cloned, or reformatted after the original plan was made. add and replace also re-probe the mounted pool at execution time before writing the journal, so confirmation/passphrase-window races still hit the UUID guard.
  • Human-facing names stay human-facing. Operators still type and read disk names such as toshiba1; mapper names and labels remain braid-<DiskName>. UUIDs appear where they help diagnostics or machine-readable state, not as the normal command vocabulary.
  • Present-device probes use live paths. Queries such as lsblk model/serial and smartctl use the live backing path (PoolState::underlying_for_uuid), and the TUI disk-detail LUKS metadata dump (cryptsetup luksDump) reads the live backing path for a verified-present (Unlocked) member – not persisted by-id setup/repair handles that can drift while the disk is still present. Metadata for locked or ownership-unverified mappers stays on the by-id handle.

Concrete Improvements

  • Membership shape is simpler. Membership has one identity axis: UUID keys map to name/by-id/devid metadata.
  • Formatting is crash-replayable. Fresh add and replace paths generate the UUID before mutation, journal it, and pass it to cryptsetup. Recovery can tell whether it is seeing the exact LUKS container that the interrupted plan intended to create.
  • Cleanup follows observed ownership. lock classifies live mappers by UUID/devid and closes the mapper it actually observed. A mapper opened as braid-WRONG but owned by disk1 is closed as braid-WRONG; braid does not merely try braid-disk1 and leave the real mapper open.
  • Recovery compares member sets by identity. Pending operations carry UUID-keyed pre-operation and target membership snapshots, so recovery can compare live topology with the journaled member set instead of re-discovering by label or assuming names still line up.
  • Display code has an explicit join rule. User-facing summaries resolve a live pool device’s UUID back to DiskName for presentation. UUIDs remain available to verbose/machine-readable paths where they are useful evidence. The TUI Data-tab Bus column is the last display correlation to adopt this rule: its lsblk transport bridge now joins the parent disk’s LUKS UUID to the member name, so transport survives mapper drift like every sibling cell instead of blanking to --.

Runtime Handles And Labels

  1. Mapper names remain braid-<DiskName>.
  2. LUKS labels remain braid-<DiskName>.
  3. Both mapper names and labels are presentation/runtime handles, not identity.
  4. LuksUuid is the only persistent identity for membership decisions.
  5. Code may construct mapper_name(&member.name) when opening or addressing braid’s expected mapper.
  6. Code must not parse mapper names or LUKS labels to decide membership, target a member, or correlate live pool state. Narrow exceptions are allowed for bootstrapping and sanity checks only: discover bootstraps from cold braid-labeled disks; returning-disk adoption in add may gate on label match after identity correlation still uses LuksUuid/devid/FSID; fresh add and replace recovery may require the expected label before treating an already-formatted target as the crash-created LUKS container, but still requires the journaled UUID to match. lock may use the braid-* prefix only to discover cleanup candidates; member identity still requires UUID/devid evidence, and candidates whose backing LUKS UUID cannot be verified are warned and skipped.
  7. lock is the special cleanup case: classify live mappers by UUID/devid first, then close the observed mapper name, not a reconstructed mapper_name(&member.name), so drifted-but-member-owned mappers are closed correctly. If mounted per-device probing fails, lock reads the mounted filesystem FSID to key the exclusive-operation preflight (so it will not unmount mid balance/replace), then scans /dev/mapper/braid-* candidates and closes only those with verified backing LUKS UUIDs. The unmount is licensed by mount-point ownership, not an FSID identity match (see Limits And Non-Goals). If a null_underlying mapper’s persisted devid resolves to multiple membership UUIDs, lock warns, leaves that mapper open, and marks cleanup uncertain instead of demoting it to orphan cleanup. lock reports disk <name>: already closed only for members the planner has proved absent from every observed live state; it must not reconstruct mapper_name(&member.name) during execute to infer absence. If a mapper is skipped because classification failed, or /dev/mapper cannot be enumerated in either close-set arm, cleanup is uncertain and lock suppresses all already-closed claims for unobserved members.
  8. Commands that reuse an already-open expected mapper for a requested by-id path must verify the mapper’s canonical backing path before trusting the mapper’s LUKS UUID. A cloned LUKS header can give two physical devices the same UUID, so the runtime proof is backing path match first, then UUID match.
  9. Recovery must fail closed when a live btrfs device lacks an observable LUKS UUID and the journal has no persisted devid binding. It must not recover by inferring identity from braid-<DiskName>.
  10. replace must re-probe the mounted pool after confirmation and passphrase verification but before sleep inhibitor acquisition, journal write, or btrfs replace start. If the pool is no longer mounted, the FSID differs from the planned pool, or any live pool device has the replacement target’s LUKS UUID, replace fails closed with the canonical pre-journal validation or DuplicateUuid { scope: LivePool } refusal.

Offline Disk State

A recorded member whose by-id path is present, whose LUKS header is readable, and whose on-disk LUKS UUID matches the pool.json membership key is identity verified. If that member is not assembled into the live btrfs pool, status and TUI surfaces render it as offline, distinct from missing (device absent) and unknown (braid cannot classify the state).

offline is deliberately cause-neutral. It can describe a locked member in a degraded mount, an interrupted post-commit mutation, or another state where membership and live btrfs topology have not yet been reconciled. Because those causes have different remedies, braid status does not print an Action: hint for offline rows.

braid doctor’s declared_disks check also surfaces an offline member as a cause-neutral Warn, never Fail; Fail stays reserved for a live LUKS UUID mismatch. When the pool is mounted but live topology cannot be probed, doctor warns rather than claiming every declared member is assembled.

Limits And Non-Goals

  • A LUKS UUID identifies an encrypted LUKS container, not a physical drive, enclosure slot, SATA port, or by-id path.
  • A cloned LUKS header intentionally has the same UUID as its source. Braid treats that as a duplicate identity and rejects it; it does not invent a new member identity for the clone.
  • Mapper and label drift are tolerated for correlation and cleanup, but braid does not silently rewrite drifted mapper names or labels back into the expected braid-<DiskName> form.
  • devid remains btrfs state. It is allowed only as a prior binding for missing/null-underlying cases where btrfs can still identify a member but braid cannot currently observe the LUKS UUID.
  • A member with neither observable LUKS UUID nor journaled/persisted devid is not recoverable by mapper-name inference. The right behavior is to preserve recovery state and require manual reconciliation.
  • UUIDs are not a user-facing naming scheme. They may appear in diagnostics, pool.json, pending-op.json, and machine-readable output, but command selection and normal summaries should continue to use DiskName.
  • lock’s mounted-fallback teardown unmounts the configured btrfs mount point (licensed by mount-point ownership, not an FSID identity match – braid persists no durable pool FSID to compare a probe against), then scans only /dev/mapper/braid-* and closes by backing LUKS UUID: verified member UUIDs close as members, verified non-member braid-* mappers close as orphans; non-braid-* devices and unverified candidates are skipped. The cleanup is scoped by the braid-* namespace plus UUID, not by which devices backed the unmounted filesystem. Consequence: a foreign btrfs at braid’s mount point would be unmounted (a non-destructive, EBUSY-safe umount with no -f/-l); a foreign filesystem normally sits on non-braid-* devices, so the realistic consequence is the unmount alone. This is accepted, and gating it would require a durable pool-FSID identity axis this decision deliberately omits to keep membership single-axis.

Tests That Enforce This

  • cli/src/membership.rs unit tests pin UUID-keyed pool.json, reject stale value-side luks_uuid, and enforce duplicate checks across UUID, name, by-id, and devid axes.
  • cli/src/types.rs and cli/src/cmd.rs unit tests reject user-supplied --uuid/--label extras and pin the structured cryptsetup luksFormat --uuid <uuid> --label <label> argv order.
  • cli/src/status.rs unit tests pin compact status names by resolving live pool UUIDs back to DiskName, including a drifted mapper case.
  • cli/src/status.rs and cli/src/tui/probe.rs unit tests pin that a present, LUKS-identity-verified member absent from the live pool renders offline, not missing or unknown.
  • cli/src/doctor.rs unit tests pin that declared_disks renders verified members absent from the live pool as cause-neutral Warn, keeps UUID mismatches as Fail, preserves offline-pool identity-only behavior, and warns when mounted-pool topology cannot be probed.
  • tests/cli/braid-status-rust.py pins that present disks’ rendered luks_uuid equals the real cryptsetup UUID and the pool.json membership key, and that name is the operator name, in intact and degraded states.
  • tests/cli/braid-status-rust.py pins that a degraded mount with one closed verified member renders that member as OFFLINE in human output and offline in JSON while the pool summary remains degraded.
  • tests/cli/braid-doctor-offline-member.py pins that a degraded mounted pool with one closed verified member makes declared_disks warn with offline wording, while a fully assembled pool and an offline pool remain Ok.
  • tests/cli/status-mapper-drift.py pins that braid status resolves the operator name via the UUID join when a member is open under a drifted mapper (braid-WRONG), not the mapper basename, in both JSON and human output.
  • cli/src/tui/probe.rs unit tests pin the TUI Data-tab Bus column’s transport join to the parent disk’s LUKS UUID, so a member open under a drifted mapper (braid-WRONG) still renders its bus instead of degrading to --.
  • cli/src/tui/probe.rs unit tests pin that the disk-detail LUKS metadata dump reads the live backing path for a verified-present member (surviving by-id drift), and that a foreign / ownership-unverified mapper does not surface the live device’s metadata under the declared disk.
  • cli/src/tui/probe.rs and cli/src/tui/browse/state.rs unit tests pin that the TUI Browse SMART picker resolves a verified-present member through its live backing path (PoolState.disk_underlying, shared with the Data-tab SMART loop) and an offline member through its persisted by-id handle, so the two SMART surfaces cannot disagree under by-id drift. tests/cli/braid-tui-browse.py pins the live /dev/vd* node end-to-end for a present, unlocked member.
  • cli/src/lock.rs unit tests pin the normal UUID/devid-classified close set, observed-mapper closing, UUID-scanned fallback cleanup, orphan warnings for non-member UUID/devid cases, duplicate-devid null_underlying skip behavior, and skip warnings for unverified candidates.
  • cli/src/remove.rs unit tests pin all live member devids into the pre-operation journal snapshot before mutation, so recovery has a legitimate fallback binding when LUKS UUID is not observable.
  • cli/src/recover.rs unit tests verify recovery refuses a null-underlying member when the journal lacks both observable UUID and persisted devid, instead of falling back to mapper-name inference.
  • cli/src/enroll_key_file.rs unit tests verify standalone enroll rejects a member whose live LUKS UUID does not match the pool.json membership key before any slot inventory or keyfile mutation runs. Enroll also re-probes each member’s live UUID again at its mutation boundary, after the passphrase prompt and before luksAddKey, to catch a disk swapped or reformatted during the prompt window: unit tests pin the standalone re-probe’s mismatch and fail-closed arms and the discovery->execute window closure (a swap that passes discovery is rejected at execute before any keyfile is enrolled or generated).
  • cli/src/replace.rs unit tests verify ReplacePlan::execute re-probes the live pool before journal write, rejects unmounted/FSID-drifted/colliding live-pool state, and still proceeds when the fresh probe is clean.
  • cli/src/recover.rs unit tests verify post-maintenance replace recovery re-probes the old mapper UUID before close, skips foreign mappers, and still closes owned active dm mappings without relying on /dev/mapper path nodes.
  • cli/src/luks.rs and cli/src/probe.rs unit tests verify already-open expected mappers must have the requested backing path before UUID ownership is accepted.
  • tests/cli/luks-mapper-drift.py verifies braid lock closes the observed drifted mapper owned by a member UUID.
  • tests/cli/luks-lock-skipped-no-false-closed.py verifies skipped mapper uncertainty does not produce false already closed rows.
  • tests/cli/unlock-uuid-mismatch.py, tests/cli/enroll-uuid-mismatch.py, and tests/cli/recover-replace-existing-luks-uuid-mismatch.py verify swapped or reformatted disks fail UUID re-checks before unsafe replay, slot enrollment, or mount.
  • tests/cli/replace-new-in-pool-guard.py verifies duplicate LUKS UUIDs are rejected before braid writes membership or calls into btrfs mutation.
  • tests/cli/replace-live-pool-collision-race-rejected.py verifies replace’s execute-time live-pool re-probe rejects a cloned replacement UUID added to the mounted pool while replace waits for confirmation.
  • tests/cli/braid-add-cloned-luks-header-rejected.py and tests/cli/replace-cloned-luks-header-rejected.py verify cloned LUKS headers cannot make add or replace reuse a mapper opened from the wrong physical device.
  • tests/cli/braid-add-persists-before-balance.py verifies fresh add writes canonical UUID-keyed membership, without a duplicate value-side luks_uuid, before post-add maintenance continues.
  • tests/cli/braid-doctor-uuid-swap.py verifies braid doctor fails closed when a member’s live LUKS UUID diverges from its pool.json key, surfacing the swap before any mutating command runs.

Consequences

  • pool.json key order is UUID order, not disk-name order. Display surfaces that need stable operator ordering must sort by DiskName.
  • Recovery trusts journaled UUID-keyed membership snapshots for phase-specific replay and verifies live UUIDs again at mutation boundaries where a physical disk could have been swapped or reformatted.
  • Mapper and label drift no longer break membership correlation, but drifted handles are not silently reconciled back into membership.
  • Cloned disks with duplicate LUKS UUIDs are rejected before membership is written.

Rejected Alternatives

  1. Keep disk name as identity. Disk names are useful for humans but are not intrinsic to the encrypted device. Keeping them as identity preserves the label/mapper drift hazard.
  2. Use by-id as identity. by-id paths identify hardware slots/devices, not encrypted membership. They can change with enclosures or controller behavior, and they do not detect cloned LUKS headers.
  3. Use btrfs devid as identity. Devids are live filesystem state and are unavailable before mount. They remain useful only as fallback binding for missing or null-underlying devices.

See

Decision: Browse Tab Is Raw Output, Curated Tabs Are First-Class UX

Principles:

Context

The original standalone browse command exposed low-level btrfs command output in its own TUI. The main braid tui later grew real top-level tabs for curated pool views such as Data and Scrub. Keeping a separate browse runtime duplicated input handling, event filtering, command-generation guards, and snapshot tests.

At the same time, not every useful operator view should become a polished TUI panel immediately. Some data is most useful as complete raw command output while braid is still learning which parts deserve a first-class workflow.

Decision

braid tui owns the interactive UI surface. The standalone browse command is removed, and its raw inspection workflow lives as the Browse top tab.

The top tabs are:

  • Data – curated pool and disk health UX.
  • Scrub – curated scrub status UX.
  • Browse – raw CLI output inspector.

Browse is intentionally low-level and pass-through. It may overlap curated tabs because overlap is not duplication here: Browse answers “what did the underlying tool say?” while curated tabs answer “what should the operator understand or do next?”

Features graduate out of Browse only when they need dedicated interaction, history, safety checks, progress semantics, or domain-specific summaries. Until then, Browse is the holding area for complete command output.

Consequences

  • Raw command coverage and parser canaries must exercise Browse through braid tui, not a separate non-interactive browse command.
  • The Browse tab can expose Btrfs and NUT commands even when related curated panels already exist.
  • New Browse entries should be append-only within the Browse program/command menus unless they are promoted to a curated tab with a separate design reason.

Decision: Pool Lock Is Rust-Owned

Principles:

Context

The shell wrapper originally serialized selected CLI operations by taking /run/braid-pool.lock before execing the Rust binary. It also performed post-success lifecycle work such as mount-point permissions and braid-online.service activation/deactivation.

That split created two sources of truth:

  • The wrapper had to know which Rust subcommands mutate pool state.
  • Rust had to know which commands read pool.json, prompt, probe devices, or write recovery journals.

The lists drifted. lock and enroll needed the same early serialization as other mutators but were not naturally owned by the wrapper’s subcommand case logic. Wrapper ownership also made it easier for Rust dispatch to grow a pre-lock state read later, which would violate the stale-state invariant.

Decision

Rust dispatch (cli/src/main.rs) owns pool-operation locking. The lock_policy function in cli/src/main.rs is the single source of truth for mapping Commands variants to lock acquisition disciplines. Its wildcard-free exhaustive match makes every new subcommand choose a discipline at compile time. For commands whose policy acquires the pool lock, dispatch acquires /run/braid-pool.lock before loading config, loading membership, probing pool state, prompting, or writing journals.

The shell wrapper is a pure exec shim. It only sets the module-controlled PATH and execs the packaged Rust binary.

braid-online.service uses a distinct shutdown entry point:

braid lock --systemd-stop --deadline-secs <n>

The module option braid.lockSystemdStopDeadlineSecs controls <n>. Its default is 270 seconds, and the module asserts that it is strictly below braid-online.service TimeoutStopSec (300 seconds).

Lifecycle work also lives under the Rust-held pool lock:

  • After every unlock, add, and recover attempt, success or failure, dispatch runs mark_online as a finalizer. The is_mountpoint gate inside mark_online short-circuits when the operation failed before mounting; the bootstrap-add and recover cases where the mount succeeded but a later step returned Err are exactly where this finalizer matters.
  • Plain braid lock calls mark_offline after successful unmount/close.
  • The lock path stops lifecycle-bound scrub units and BoundBy braid-online.service consumers before unmounting.

Systemd lifecycle synchronization is gated by systemd_lifecycle = true in runtime config. modules/braid/cli.nix emits that flag for module-managed installs; standalone CLI configs omit it and therefore skip braid-online.service, scrub-unit, and BoundBy systemctl calls. The pool lock and pool_access_group mount-root permission fixups still run outside that gate.

Snapshot Rule On systemctl start

mark_online snapshots braid-online.service ActiveState at the start of the pool-lock window. It starts the unit only if the snapshot was inactive or failed.

It must skip active, activating, and deactivating. The deactivating case is load-bearing: if a stop job is already running and its ExecStop needs the pool lock, a new systemctl start braid-online.service would queue behind that stop. If the caller already holds the pool lock, the queued start can deadlock against the in-flight stop. Snapshot gating prevents the start from being queued in that state.

Unknown snapshot results warn instead of starting. The pool remains mounted and usable, but automatic shutdown cleanup may be missing.

Lock Tolerates Missing Or Corrupt Membership

Lock-side dispatch loads pool membership from pool.json only; it consults no recovery journal. If pool.json is missing, unreadable, corrupt, or fails its uniqueness checks, lock does not abort – it warns and proceeds with empty membership. On the live plain-lock and braid-online.service ExecStop paths the warning goes to stderr; under --dry-run it is folded into the stdout preview to preserve the single-stream dry-run contract (ADR 022).

Membership is advisory for lock, not authoritative – its only role here is to attach friendly member names to status output. What lock closes is decided from observed state, not from pool.json:

  • mappers backing the live mounted pool, proven during the per-device probe by cryptsetup status + cryptsetup luksUUID;
  • mounted-pool members whose backing device is gone (device: (null)), matched by their persisted btrfs device id;
  • otherwise-stranded /dev/mapper/braid-* mappers, each confirmed by cryptsetup status + cryptsetup luksUUID (see ADR 024) before it is closed.

With empty membership these mappers classify as unnamed orphans rather than named members and are still closed. Fallback scanning is limited to /dev/mapper/braid-*; mounted-pool cleanup closes only the mapper paths reported by the pool mounted at the configured mount point. A candidate that fails verification, a /dev/mapper scan that fails, or a duplicate-devid conflict is skipped with a warning and may leave cleanup incomplete – the operator resolves it by re-running braid lock or reconciling pool.json.

This closes the failed-bootstrap-add lifecycle hole without a journal. A bootstrap add can mount the pool and open its LUKS mappers, then fail before braid writes the first pool.json. If shutdown follows, braid-online.service ExecStop runs braid lock --systemd-stop, finds no pool.json, and still unmounts and closes those mappers – because what to close is read from the live mounted pool and the observed mappers, not from pool.json.

Lock therefore needs no special case for which operation was interrupted. An interrupted Remove, RemoveMissing, Replace, or live-pool Add is reconciled by braid recover against its pending-op.json journal; lock neither reads nor needs that journal to perform safe shutdown cleanup.

Stop Coordinator + Done Protocol

Plain braid lock acquires /run/braid-stop-coordinator.lock before the pool lock. After cmd_lock finishes unmounting and closing LUKS, it writes done\n to that coordinator file and then synchronously stops braid-online.service.

The recursive ExecStop reentry runs braid lock --systemd-stop --deadline-secs <n>. If the stop coordinator is held, the reentry polls for either:

  • done\n, which means the plain lock already completed the disk cleanup and the reentry can exit 0 immediately.
  • coordinator release without done\n, in which case it may proceed to acquire the pool lock and run the cleanup itself.
  • deadline expiry, in which case it exits 1 before systemd’s TimeoutStopSec can kill it.

This protocol replaces the stop-side snapshot gate from ADR 018 for plain braid lock’s mark_offline. The synchronous stop is intentional: user invocations should return only after braid-online.service is inactive, while recursive ExecStop has a deterministic poll-out path instead of queuing behind itself.

Between writing done\n and stopping braid-online.service, mark_offline re-checks mountpoint -q and treats a check failure (e.g. OnlineError::Spawn mid-shutdown) as still-mounted: it warns and skips the stop, leaving braid-online.service active. The operator can re-run braid lock or systemctl stop braid-online.service to recover. This mirrors the “unknown snapshot results warn instead of starting” rule from the Snapshot Rule On systemctl start section: when state is unknown, the fail-safe direction is to leave the lifecycle owner active rather than deactivate over a possibly live pool.

Consequences

  • There is a single source of truth for the locked-command list and acquisition discipline: lock_policy in Rust dispatch.
  • The wrapper cannot drift from Rust command semantics because it no longer interprets subcommands.
  • Lock acquisition is the first real execution boundary for covered commands. Environment failures such as lock contention happen before recovery journals.
  • mark_online must keep the start-side snapshot rule to avoid the deactivating deadlock.
  • mark_offline must keep the stop coordinator and done\n protocol because it deliberately uses synchronous systemctl stop after cleanup.
  • The pool lock is independent from the sleep inhibitor. The lock prevents stale concurrent state reads; the inhibitor still protects only the non-interruptible mutation window described in ADR 019.

Decision: Pin block-group-tree at mkfs time

Context

braid pins its toolchain to nixos-26.05’s btrfs-progs 6.19.1, whose default mkfs feature set enables block-group-tree. braid requests that one feature bit explicitly rather than inheriting it from the default, so the on-disk feature set is determined by braid and not by whichever btrfs-progs the running toolchain links. (The flag predates the 26.05 bump: under the older nixos-25.11 btrfs-progs 6.17.1, which did not default block-group-tree, the same flag made new pools forward-compatible with the 6.19 default.)

This pin is deliberately narrow. mkfs.btrfs still starts from the linked btrfs-progs default feature set; braid only adds block-group-tree to that set. The rest of the on-disk feature set continues to track btrfs-progs defaults.

Decision

cli/src/cmd.rs passes -O block-group-tree on both mkfs.btrfs invocations: single-disk bootstrap and RAID1 bootstrap. New pools carry the block-group-tree bit explicitly – matching the btrfs-progs 6.19 default that braid’s pinned toolchain ships – without freezing any other mkfs default.

The long form is preferred over the bgt alias because it is the documented primary name and matches the kernel sysfs entry block_group_tree.

Where this is enforced

  • cli/src/cmd.rsMkfsBtrfs and MkfsBtrfsRaid1 build the mkfs.btrfs argv with -O block-group-tree.
  • cli/src/cmd.rsmkfs_btrfs_single_generates_correct_argv and mkfs_btrfs_raid1_generates_correct_argv assert the exact argv.
  • tests/module/mkfs-block-group-tree.{nix,py} – VM coverage asserts the on-disk feature bit after braid add creates single-disk and RAID1 pools.

Notes

  • block-group-tree is a compat_ro feature. The kernel rejects unsupported compat_ro bits for read-write mount but may still allow a read-only mount if no log replay is required. The kernel-side feature has been available since 6.1; NixOS 26.05 ships kernel 6.18, so normal braid read-write operation is always supported.
  • Existing pools created before this pin are unaffected. Offline conversion is possible via btrfstune --convert-to-block-group-tree; braid does not wrap that.
  • Forward-compat note: a rescue boot from very old live media (kernel <6.1) cannot read-write mount a block-group-tree pool. A read-only mount may still succeed if no log replay is needed. This is not a blocker because braid does not ship rescue media, but the constraint should stay visible.

Decision: Seal the offline pool mountpoint immutable

Context

The pool mountpoint (default /mnt/storage) is a plain directory on the root filesystem. When the pool is mounted there, writes go to the pool; when it is NOT mounted, that bare directory is still writable, so any process writing under the path silently lands data on the ROOT disk. When the pool later mounts over it, that data is shadowed (invisible), permanently consumes root space, and the write looked like it succeeded. This is the classic “unmounted mountpoint” data-safety bug.

braid sets the inode immutable attribute (FS_IMMUTABLE_FL, a.k.a. chattr +i) on the bare mountpoint directory while it is unmounted:

  • Unmounted: a create/write under the directory fails immediately with EPERM.
  • A filesystem can still be mounted OVER an immutable directory; once mounted, the mounted filesystem’s own root inode governs writes, so normal pool writes work.
  • The attribute is persistent inode metadata (survives unmount and reboot).
  • Setting it requires CAP_LINUX_IMMUTABLE; braid already runs privileged.

braid is the correct owner because the invariant has a hard timing rule:

Only ever set +i when the path is NOT currently a mountpoint. Setting it on a mounted path seals the MOUNTED filesystem’s own root inode – blocking all pool writes and persisting on the pool until cleared.

braid knows the mount state and controls the lifecycle, so it can honor that rule reliably. A bare tmpfiles chattr +i hack could not: it would seal the live pool root during a nixos-rebuild switch performed while the pool is mounted. braid’s unit gates on ConditionPathIsMountPoint=! and the in-CLI fd STATX_ATTR_MOUNT_ROOT check, so it only ever seals the offline bare dir.

Mechanism (verified against the pinned kernel)

  • Mount-over-immutable is allowed. There is no IS_IMMUTABLE check in the kernel mount path (reference/linux/fs/namespace.c); the guard lives only in fs/attr.c. So the pool mounts over the sealed dir.
  • +i blocks metadata writes. may_setattr (reference/linux/fs/attr.c) returns -EPERM for chmod/chown/explicit-time changes on an immutable inode – the basis for the tmpfiles interaction below.
  • The kernel refuses rmdir of an immutable dir. may_delete -> IS_IMMUTABLE -> -EPERM (reference/linux/fs/namei.c), so a sealed offline mountpoint cannot be silently removed and recreated mutable while offline.
  • The fd-based mount-root check uses statx’s STATX_ATTR_MOUNT_ROOT, which is authoritative: unlike an st_dev-vs-parent comparison it also detects same-device and bind mountpoints (util-linux’s own mountpoint.c notes its st_dev fallback “is … not able to detect bind mounts”).

Decision

1. Always-on (non-configurable)

The seal is an unconditional safety invariant, in the same class as the baked-in base mount options braid sets unconditionally – noatime (ADR 015) and skip_balance (Principles). There is no immutableWhenUnmounted knob.

Rationale: there is no legitimate “off” use case (writing the bare offline mountpoint is the bug). The escape hatches that matter – graceful degradation on an unsupported fs / old kernel (Unsupported / MountStateUnknown) and the braid seal-mountpoint --unseal <path> lever – exist independently of any flag.

Tradeoff: the only capability lost is a declarative, rebuild-time off switch. Recovery from any unforeseen interaction is the manual --unseal plus the graceful self-disable, not a NixOS option flip. The always-on default is reversible later if a concrete need ever surfaces (a knob could be re-added trivially).

2. Close the boot window

A boot-time seal makes the invariant hold from boot, not only after the first unlock. A NAS waiting for SSH unlock (auto-unlock off, or USB key absent – braid-auto-unlock.service exits 0 on skip) otherwise sits offline-and-writable indefinitely, and a unlock-path seal would never fire because nothing mounts.

3. Seal from the boot/activation unit ONLY

The seal lives in exactly one place: the braid-seal-mountpoint oneshot (modules/braid/storage.nix). braid add does NOT seal, and neither does the mount path. This is not a coverage gap – a create-time seal would be a redundant AlreadyImmutable no-op – for two compounding reasons:

  • The oneshot runs on every activation, not just reboot. braid-seal-mountpoint.service is Type=oneshot with no RemainAfterExit, so it returns to inactive (dead) once ExecStart exits (reference/systemd/man/systemd.service.xml). NixOS’s switch-to-configuration-ng starts all active targets and systemd re-enqueues their inactive (dead) Wants= dependencies, so the dead oneshot is started again on every nixos-rebuild switch/test as well as every boot (self-healing). You cannot enable braid or change braid.mountPoint without an activation that runs the seal.
  • The mountpoint is static and pre-exists every pool. cfg.mountPoint is a single fixed path created by the tmpfiles rule d ${cfg.mountPoint} on every boot/activation, so the seal unit seals it (while offline) BEFORE any braid add can run. The pool then mounts OVER the already-sealed dir; +i persists on the underlying inode, and braid’s lock/unmount path never rmdirs or chmod/chowns the bare dir, so the next braid lock reveals it still sealed.

So any pool bootstrapped after braid is enabled inherits an already-sealed mountpoint, and persistence carries the seal across every later unlock/lock with no re-seal. The seal is NOT in the create/bootstrap path or the bring-online mount path; the only seal call outside braid seal-mountpoint is the doctor’s read-only probe.

The braid-seal-mountpoint unit is ordered before braid-auto-unlock.service. Both are pulled in by multi-user.target; without the edge they race, and if auto-unlock won it would mount the pool and the seal unit’s ConditionPathIsMountPoint=! would then skip the seal. An auto-unlock-with-USB NAS never boots offline, so without this edge nothing would ever seal the bare dir. Ordering before auto-unlock runs the seal in the pre-mount window every boot; auto-unlock then mounts over the sealed dir and persistence carries it. When autoUnlock is disabled the unit does not exist and before is a harmless no-op ordering string.

The doctor “offline + mutable -> Warn” check is the detection/self-heal signal for the rare out-of-band unseal (e.g. a raw chattr -i); the next boot or activation re-seals.

Static-vs-dynamic mountpoint distinction (Rockstor precedent)

Rockstor (a btrfs NAS) ships create-time sealing – commit 5836560bbd1430c99fc73e3b6408fe3dcfd2220b, “Make top level mount directories read-only when unmounted. Fixes #1414” – BECAUSE its mountpoints are dynamic per-object /mnt2/<name> dirs born at creation with no boot-time existence to seal, and it has no boot re-seal. braid’s single static mountpoint plus an activation/boot oneshot that fires before any create makes boot-only sufficient and create-time redundant; braid’s boot re-seal also fixes Rockstor’s fragility (create-only sealing never recovers from an out-of-band chattr -i).

Rockstor validates the MECHANISM: its bind_mount does mkdir -> chattr +i -> mount --bind over the sealed dir (mount-over-immutable), and teardown does chattr -i -> rmdir (the kernel refuses rmdir of an immutable dir – the same basis as braid’s --unseal lever).

Revisit-if: if braid ever moves away from the single static mountpoint (e.g. per-subvolume mounts at distinct root-fs paths, born on demand like Rockstor’s), create-time sealing becomes necessary and this decision should be revisited.

Maintenance levers

braid seal-mountpoint is a visible command (cli/src/main.rs) with three forms (cli/src/mountpoint_guard.rs):

  • braid seal-mountpoint (no args) – the bare boot/internal form. Seals the configured mount_point. Best-effort: it always exits 0 (a missing/inert guard must not block boot) and is lock-free. This is what the oneshot runs.
  • braid seal-mountpoint <path> – seal an explicit path. Lock-free, but reports an HONEST desired-state exit code: exit 0 iff the path ends up immutable (Set or AlreadyImmutable), non-zero otherwise. This is the remedy for separate-path subvolume mountpoints (below), where a silent best-effort exit 0 would hide an unprotected path the doctor cannot see.
  • braid seal-mountpoint --unseal <path> – clear +i on an explicit path. Unlike the seal forms this is an operator remediation, not a boot action, so it (a) ACQUIRES the pool lock (fail-fast on contention), serializing against an in-flight unlock/lock so a concurrent mount cannot land the pool over a just-cleared bare dir; (b) REFUSES the currently configured mount_point (the live path must stay sealed while offline); (c) exits non-zero unless the path ends up mutable (Cleared or AlreadyMutable, so a repeat unseal of an orphan reports success).

All three forms route through the same fd-guarded enforce (cli/src/mountpoint_guard.rs#enforce), which refuses any live mountpoint (SkippedMounted) via STATX_ATTR_MOUNT_ROOT, so the levers only ever touch an offline bare dir.

Doctor detection

braid doctor is the sole non-boot detection signal under the boot-only model. The pure classifier cli/src/doctor.rs#classify_mountpoint_immutability warns when the pool is offline and the mountpoint is mutable (invariant not yet held – self-seals on the next boot/activation, or run braid seal-mountpoint), and fails when the pool is mounted and the inode is immutable (a live pool root was sealed – a tripwire that should never fire). Both the mount-state and immutability inputs are tri-state, so a failed probe or an unsupported root suppresses the finding rather than producing a misleading hint – the seal unit owns the single “protection unavailable” warning.

Caveats

External writers (intended behavior change)

This is a behavior change for operator-configured services, not a no-op. On a NAS, services like Samba/NFS exports, Syncthing, Nextcloud, or cron/rsync backups are routinely wantedBy multi-user.target and will write to /mnt/storage while the pool is offline (auto-unlock skipped or USB absent, awaiting SSH unlock). With +i those writes now fail with EPERM. That is the intended win: a loud EPERM replaces the silent write-to-root that leaked space and got shadowed on mount. An operator whose backup/share service runs while the pool is offline should expect the new EPERM.

Sole-mounter / fstab assumption

This invariant assumes braid is the only thing mounting the path. The module replaced the fileSystems entry, so braid is the sole mounter by design – there is no fstab entry racing it. If an operator adds their own fstab line or mount unit for the pool, external mount/unmount can bypass the seal and the invariant can drift; the doctor check is the detection mechanism.

Reconfiguration (changing mountPoint)

braid seals and checks only the CURRENTLY configured mount_point. If an operator changes braid.mountPoint (say /mnt/storage -> /srv/storage), the nixos-rebuild switch that applies the change runs the seal oneshot for the NEW path during that same activation, so the new path is sealed promptly. braid does NOT auto-clear the OLD one – the old bare directory keeps its +i until cleared, so a later rmdir or reuse of the old path fails with EPERM. This is the same class as any NixOS path option (changing dataDir leaves the old directory behind); braid does not track prior mountpoints.

Remediation is the explicit-path clear lever (not chattr, which is absent from the appliance wrapper PATH): braid seal-mountpoint --unseal /mnt/storage. The old path is offline, so the fd guard clears it safely, and --unseal refuses only the currently configured mount_point (now /srv/storage), so clearing the OLD, no-longer-configured path is allowed. The doctor cannot surface the orphaned old path (without a recorded prior mountpoint it has nothing to probe), so discoverability is via this doc and the EPERM-on-rmdir symptom, by design.

Separate-path subvolume mounts (not auto-sealed)

The boot seal covers ONLY cfg.mountPoint. braid documents and tests a pattern (Mounting subvolumes) that mounts subvolumes at SEPARATE root-fs paths – e.g. /var/lib/jellyfin/media – via systemd.mounts with bindsTo = braid-online.service. When the pool is offline those mount units are stopped, leaving bare root-fs directories at those paths, so an undocumented writer there lands data on root – the identical bug, NOT covered by the boot oneshot (it seals one static path).

  • Subvolumes mounted UNDER the sealed /mnt/storage are inherently protected by the parent seal and are the safe default.
  • Subvolumes mounted at separate paths are an advanced, operator-opt-in pattern. This decision does NOT auto-seal them; it documents the limitation and points operators at the manual braid seal-mountpoint <path> lever (whose honest exit codes matter precisely because the doctor cannot see these paths).

The manual lever is honestly half-protective (not self-healing, and the doctor cannot see these paths). Revisit-if: a fully-declarative braid.extraSealedMountPoints list that the boot/activation oneshot would seal alongside cfg.mountPoint (with the same auto-seal + re-seal + doctor coverage). It is additive – it does not reopen Decision 1’s no-knob stance – but it is a real new public option with non-trivial scope (a multi-path seal loop, per-path doctor coverage, and a correctness wrinkle the static pool mountpoint does not have: a systemd.mounts target dir may not exist until first mount, so an offline-before-first-mount path reports Absent until created). Deferred until the manual lever proves insufficient.

Filesystem support

FS_IMMUTABLE_FL is effectively universal on real Linux roots (btrfs/ext4/xfs/f2fs/tmpfs all implement .fileattr_set). The Unsupported self-disable realistically fires only on non-NAS roots (vfat/9p/nfs), so it is a genuine but rare escape hatch, not a central rationale pillar. When it fires the seal unit emits one clear “root filesystem does not support the immutable attribute” warning, and the doctor stays quiet (it does not contradict that signal with an un-actionable reseal hint).

Dry-run / preview

Nothing to integrate. No braid plan-and-execute command seals the mountpoint, so ADR 022 imposes no obligation here: the seal is an ambient systemd-unit-managed invariant (the same class as the tmpfiles d ${cfg.mountPoint} rule), applied by the boot/activation oneshot outside the plan/preview/execute model.

See

  • modules/braid/storage.nix – the braid-seal-mountpoint oneshot.
  • cli/src/mountpoint_guard.rs – the guard, the seal site, and the maintenance levers.
  • cli/src/doctor.rs#classify_mountpoint_immutability – the detection signal.
  • ADR 018: Systemd lifecycle – the unit lifecycle model.
  • Mounting subvolumes – the separate-path caveat.

Decision: Release process

Context

braid had no releases: no git tags, the version hardcoded in two places, and the sole consumer (caja) tracked master HEAD. We want a repeatable just release patch|minor|major that bumps + tags + publishes, a binary cache so consumers do not recompile Rust on the NAS, and a “pin to latest release” story.

The hard constraint that shaped the design: the maintainer’s Mac cannot build the x86_64-linux binary consumers need. The nix-darwin linux-builder advertises aarch64-linux only; x86 emulation is intentionally omitted. So the x86_64-linux build and the cache push run in GitHub Actions on the release tag, not locally.

A precondition: the braid repo is public. That makes GitHub-hosted Actions runners free, lets the github:danneu/braid?ref=release flakeref resolve without a token, and keeps CACHIX_AUTH_TOKEN unexposed to forks.

Decision

Consumer pin = a moving release branch

The release fast-forwards a release branch to each vX.Y.Z tag’s commit. Consumers pin braid.url = "...?ref=release"; nix flake update braid is the “upgrade to newest release” button, and flake.lock still pins the exact rev. This is the ecosystem convention for a release channel (NixOS/nix latest-release, cachix latest) and mirrors how a consumer already follows a nixos-26.05 branch and lets the lockfile pin.

The release branch is machine-owned: only release.yml advances it, and only to a master-descended commit (enforced by the ancestry guard below). Never commit to it, and ensure no branch protection blocks the Actions token’s push.

Version single source of truth = cli/Cargo.toml

flake.nix#commonArgs reads pname + version from cli/Cargo.toml via craneLib.crateNameFromCargoToml (a pure path read, no IFD). braid --version already reads CARGO_PKG_VERSION from the same manifest via clap. So cli/Cargo.toml is the only version string in the repo, and cargo release bumping it is the only version edit.

The invariant is enforced, not merely conventional: the flake check eval-version-matches-cargo (tests/eval/version-matches-cargo.nix) asserts the built braid-cli-unwrapped.version equals cli/Cargo.toml’s [package] version. It is trivially true while the flake reads from the manifest, but fails loudly if anyone reintroduces a hardcoded flake literal that then drifts.

Version bump = cargo-release; build + publish in CI

just release (Mac-side) runs cargo release from the workspace root, so its config lives in [workspace.metadata.release] in the root Cargo.toml. Two independent publish guards: [workspace.metadata.release] publish = false stops cargo release from touching crates.io, and [package] publish = false in cli/Cargo.toml makes a direct cargo publish refuse outright. tag-name = "v{{version}}" overrides cargo-release’s workspace-member default (braid-cli-v{{version}}), because the release-branch FF and gh release flow assume vX.Y.Z.

Pre-1.0 bumps are plain semver: patch 0.0.1->0.0.2, minor->0.1.0, major->1.0.0 (so minor’s jump to 0.1.0 is expected, not a surprise). The in-tree pre-release version is 0.0.0, so the first just release patch cuts v0.0.1 through the same path as every later release – there is no special-case bootstrap.

The tag triggers .github/workflows/release.yml, a single sequential job ordered cheapest-gate-first: ancestry guard -> tag/version guard -> Rust test + version eval gate -> build x86_64-linux -> push cache -> create GitHub release -> fast-forward release. Two guards close the trust gap before any build or cache write: an ancestry guard (git merge-base --is-ancestor) rejects any v* tag whose commit is not on master, and a tag guard rejects any tag that is not ^vX.Y.Z$ equal to cli/Cargo.toml’s version at the tagged commit. The release FF is the last step and the sole consumer-visible “it’s released” gate: it lands only after the cache is warm and the GitHub release object exists, so no consumer can nix flake update to a half-published rev. Every step is idempotent, so a failed run is re-runnable from the Actions UI. The GitHub release body is rendered by git-cliff, pinned in the .#release devShell and invoked as nix develop .#release -c git-cliff, from cliff.toml: conventional commit types are grouped into stable sections such as Features, Bug Fixes, Documentation, Tests, CI, Build, and Chores, while unmatched commit subjects land in Other. The first release (v0.0.1) is a one-time exception and publishes an intentionally blank body instead of a whole-history changelog; later genuinely empty rendered ranges get the _No notable changes._ placeholder.

Public cachix cache braid

The cache is public; consumers add the substituter https://braid.cachix.org + its public key and need no auth. release.yml sets skipPush: true on cachix-action and does one explicit cachix push braid <out>, so only braid-cli-unwrapped (x86_64-linux) lands in the cache – exactly what the module default (flake.nix#nixosModules.default, which sets package = braid-cli-unwrapped) consumes. The wrapped braid would duplicate all storage tools for no consumer benefit.

Behavioral gate is local, not in CI

braid does not run the NixOS VM suite in GitHub Actions, and neither just release nor release.yml requires a VM result. The release path runs the Rust tests (just test-rust) and the version-SoT eval check in release.yml on the tag, then builds and publishes. .github/workflows/test.yml stays workflow_dispatch-only (its push/pull_request triggers remain disabled).

VM coverage is a manual, per-release choice: when a release warrants it, run the suite outside the release automation – just test-vm locally, or a workflow_dispatch run of test.yml. just release keeps a cheap local compile gate (nix build braid-cli-unwrapped, darwin-native) so a Rust compile break is caught before the irreversible tag, but it does not gate on a VM run.

This is a deliberate scope choice: the VM suite is slow and runs on the maintainer’s machine through the linux-builder, and keeping it out of the release path keeps releases fast and free of VM flakiness, without turning the expensive VM workflow into a push-triggered CI gate. The tradeoff is that release behavioral coverage rests on maintainer discipline, not an automated CI gate. Revisit-if braid gains additional maintainers or the VM suite becomes cheap enough to run in CI; at that point a master VM gate plus a fail-closed parent check in just release would make the behavioral gate automatic.

This ADR is the authoritative home for the cache-path-identity rationale; ADR 010 points here. The recommended consumer snippet does not set braid.inputs.nixpkgs.follows. With no follows, braid’s nixpkgs input stays on its pinned nixos-26.05 – the exact nixpkgs the release cache is built against – so braid-cli-unwrapped resolves to the cached store path: a cache hit. Setting follows = "nixpkgs" rebuilds braid-cli-unwrapped against the consumer’s nixpkgs, producing a different store path and a cache miss (the NAS recompiles Rust, defeating the cache). follows remains a valid advanced opt-out (smaller closure via nixpkgs dedup) at the cost of release-cache path identity; it also moves the pinned tool versions onto the consumer’s nixpkgs (see ADR 010).

This aligns docs with reality: the deployed consumer already runs no-follows with a deliberate “do NOT set follows” tool-version-boundary comment.

Public-repo trust model (every secret-bearing workflow)

Going public widens the threat model for every workflow that consumes a secret, not just the release path:

  • release.yml is fork-safe by trigger: push: tags only, no pull_request, and forks cannot push tags upstream, so CACHIX_AUTH_TOKEN never reaches a fork.
  • claude.yml is not trigger-safe – it fires on public issue/comment/review events. It is hardened with a trusted-author gate: each event arm ANDs the @claude trigger with author_association in OWNER/MEMBER/COLLABORATOR, so a stranger’s @claude comment never starts the job and cannot spend CLAUDE_CODE_OAUTH_TOKEN.

Any future workflow that consumes a secret must re-clear this bar before it merges.

Risks / gotchas

  • publish = false (both layers) is mandatory – braid is a private crate that must never reach crates.io.
  • Dangling tag on CI failure: the tag exists but the release FF (the last step) never runs, so release does not advance and consumers are unaffected no matter which earlier step failed. Recover by re-running the same workflow (transient/config failures) or by fixing master and moving the same version tag to the fixed commit (the runbook has exact commands).
  • Cache trust on the consumer: skip the public-key step and the consumer reaches the cache but rejects the signature and silently rebuilds from source.
  • Master protection vs. the bump push: cargo release pushes the bump commit straight to master; a required-PR ruleset on master that does not exempt the releaser makes just release fail after the local commit/tag. Keep the releaser exempt.
  • Concurrent releases: concurrency serializes release runs (cancel-in-progress: false) and queue: max lets up to 100 later tags wait (FIFO by the time each starts waiting on the group), so a burst of tags drops no release. Because that order is wait-start time, not dispatch time, a near-simultaneous burst can still start out of order and fail an older tag’s release FF as a non-fast-forward (benign – release only moves forward – but a red run). The runbook rule stays one active release tag at a time.

See

  • justfile – the release recipe (Mac-side bump + local gates).
  • .github/workflows/release.yml – CI build, cache push, GitHub release, release FF.
  • cliff.toml – git-cliff template + commit-group config for the GitHub release-notes body.
  • tests/eval/version-matches-cargo.nix – the version single-source-of-truth eval guard.
  • Releasing – the operator runbook.
  • Toolchain pinning – no-follows default and parser-critical tool pinning.

Decision: SMART + btrfs error reporting

Context

Before this change, braid status reported btrfs device-error counters per disk but nothing about SMART. The only SMART signal status surfaced was a global smartd alert flag. The TUI showed SMART solely as a bare health enum (ok/warning/failing) in a column, with no way to see why a drive was degraded. SMART health was computed in parse_smartctl and then discardedclassify_sata/classify_nvme read the underlying counters (reallocated/ pending/uncorrectable sectors; NVMe media errors, wear, spare) only to collapse them to a SmartHealth enum.

These are observations from two different layers: the filesystem’s own I/O accounting (btrfs device errors) versus the drive’s self-report (SMART). A degraded drive can show clean btrfs I/O, and a drive with btrfs errors can pass SMART. They should be surfaced as two explicitly-named concepts, not merged behind one vague “Errors” label.

Decision

Two named concepts, not one merged “Errors”. The --json per-disk field errors renames to btrfs_errors; a new sibling object smart carries the SMART self-report. The human per-disk block relabels its Errors: line to btrfs: and gains a parallel SMART: line. (braid is pre-v1.0 with no on-disk-format backwards-compatibility obligation, so the field rename is a hard break with no shim.)

smart is a verdict plus evidence, not a flat count. SMART’s authoritative signal is a pass/fail verdict (health); the counters are supporting evidence behind it. A single summed smart_errors integer was rejected: it mixes units (reallocated sectors, wear percent, media errors, spare percent are not addable) and would render 0 on a drive reporting passed:false – the exact case where the operator most needs a signal. So health is the headline and the counters are itemized beneath it.

A protocol discriminator (sata/nvme). The evidence field set differs by transport (SATA ATA attributes vs the NVMe health-information log), so the smart object is tagged by protocol to keep the shape unambiguous and forward-compatible. NVMe is fully implemented, not deferred: media_errors is a clean headline parallel to SATA reallocated_sectors, and the NVMe spare check is a threshold pair (available_spare <= available_spare_threshold), not a generic > 0 rule – a flat numeric rule would misread a healthy available_spare of 100.

One threshold definition feeds all three surfaces. SmartEvidence::fields() yields each display field as (key, value, is_concern); concerns() is its is_concern subset. The verdict (Healthy iff concerns().is_empty()), the human SMART: parenthetical, and the TUI evidence rows (red iff is_concern) all key off this one structure and a per-field SmartField::label(), so the column verdict, the human line, and the TUI rows cannot disagree on either the threshold or the wording.

Column-summary vs detail-evidence split. The TUI disk-table column stays the bare health verdict (unchanged). The error evidence lives in the per-disk detail panel as a new SMART section, sibling to the existing btrfs Device Errors section. celsius ships in the --json smart object but is not shown in the SMART detail section (it has its own Temp column and is not a verdict input).

status probes smartctl plainly, per disk. Each braid status now spawns one smartctl -H -A --json per present disk (reusing the command the TUI already runs). No -n standby guard is needed: status reaches this live SMART probe only for a mounted pool, and ADR 031 treats mounted member disks as awake. The future locked-only braid.autoSpinDown does not overlap this mounted-only probe. The probe is failure-tolerant – any error collapses to an unknown verdict – so a flaky or absent smartctl never fails a status build. This affects only the CLI status path, not the monitor daemon.

Per-disk smart is diagnostic evidence only – it does not feed the alert latch. The “SMART health warning” alert cause stays AlertCause::SmartdAlert, driven by the smartd daemon’s flag (/var/lib/braid/smartd-alert; see ADR 014). A live smart.health == "warning" from the new per-disk probe must never synthesize an AlertCause. So a status report can show a degraded smart object while alert_active is false – this is intentional, and is documented so the two SMART signals (the live diagnostic probe vs the smartd latch) are not conflated. smartd remains the single SMART alert source because it watches continuously between status runs and applies its own vendor thresholds; the live probe is a point-in-time diagnostic.

Consequences

  • The serialized contract grows: SmartProbe / SmartHealth / SmartEvidence are now part of the --json surface. The stable-only smartctl golden fixture is the drift canary on smartmontools bumps (virtio disks emit no SMART, so the live VM canary cannot exercise this path).
  • classify_sata / classify_nvme / classify_health are removed; their thresholds now live once in SmartEvidence::fields. The verdict is derived from the evidence at a single call site, so the column, the human line, and the TUI detail cannot drift apart.
  • Every braid status now does one synchronous smartctl spawn per disk. This is accepted per the mounted-pool drive-wake posture above.

See

Decision: Drive-wake posture

Principle: HDD defaults

Context

braid previously treated all pool disks as potentially asleep at any time. The TUI reflected that blanket anti-wake posture by keeping the expensive pool probe manual-only: smartctl -H -A --json per disk plus btrfs state was run only on startup or r.

That posture was broader than the actual ownership boundary. Today braid does not park drives with hdparm -S or any equivalent per-drive standby timer. While the pool is mounted, btrfs and normal NAS access already make member drives active. A future opt-in braid.autoSpinDown may park drives, but that feature belongs to the locked state and will own the “do not wake a parked locked member” rule.

Decision

While the pool filesystem is mounted (PoolStatus::Mounted in the TUI, pool.mounted in status/doctor), braid treats member disks as awake and reads/refreshes them freely. The anti-wake concern applies only to the locked state and is owned by the future opt-in braid.autoSpinDown feature, which will gate on braid-online.service. braid adds no online-side standby detection.

The automatic read boundary is narrow: the TUI pool auto-refresh loop is the only automatic disk-read loop added here, and it re-arms only while the live pool is mounted. It does not run while the pool is locked, not mounted, or in an error state.

The user-invoked read paths keep their existing semantics:

  • braid status live SMART is mounted-only; it returns the not-mounted status before spawning per-disk smartctl.
  • braid doctor SMART self-test diagnostics may target a member’s persisted by_id path even when the pool is unmounted or locked.
  • TUI Browse-tab SMART commands are explicit user actions and may target a member’s persisted by_id path regardless of mount state.

“Online” in this decision means the mounted live pool (PoolStatus::Mounted or pool.mounted). braid-online.service is a correlated systemd lifecycle marker, not something these read paths consult; it is the handle the future braid.autoSpinDown feature will gate on.

Locked-state TUI probes are not claimed to be non-waking. The pre-existing TUI startup/manual probe builds LUKS state before the mount gate and reads on-disk LUKS2 metadata with cryptsetup luksDump --dump-json-metadata <by-id> for locked members. That read can wake a parked drive. This decision leaves that behavior unchanged and only ensures the new automatic pool loop is mount-gated.

Non-goals

  • No smartctl -n standby.
  • No Standby SMART health state.
  • No braid.autoSpinDown implementation.
  • No hdparm integration.
  • No NixOS module changes.
  • No smartd configuration changes.
  • No change to status’s mounted-only SMART probe.
  • No change to explicit doctor or Browse-tab SMART reads.
  • No auto-refresh while locked.

Alternatives considered

Online-side standby detection

Rejected. Adding smartctl -n standby and a Standby health state creates parser and UI state complexity for a state braid does not create while the pool is mounted. If an operator has configured an out-of-band standby timer, the cost of being wrong is a single wake-on-read, matching today’s explicit reads.

Blanket anti-wake posture

Rejected. Keeping every pool probe manual-only forces interactive monitoring to behave like locked-state recovery. The locked state is the only state where braid-managed drive parking belongs, so the locked-state feature should own that rule instead of leaking it into mounted-pool UX.

See

LUKS Unlock: Research Notes

Reference material for braid’s unlock mechanisms. Covers gotchas, security considerations, and design rationale discovered during implementation.

USB device naming stability

/dev/sdX names are assigned by probe order and shift when devices are added, removed, or enumerated differently across reboots. A USB stick that was /dev/sdd can become /dev/sdc if another drive is unplugged.

/dev/disk/by-id/ paths use hardware serial numbers reported by the device firmware and are stable across reboots and topology changes. Always use by-id for any persistent reference to a block device.

# Unstable — changes when drives are added/removed:
/dev/sdd

# Stable — tied to hardware serial, survives reboot and topology changes:
/dev/disk/by-id/usb-Kingston_DataTraveler_3.0_E0D55EA573FCF450-0:0

See: Arch Wiki — Persistent block device naming

Passphrase file vs binary keyfile

braid enrolls and opens both the shared passphrase and the auto-unlock keyfile as LUKS keyslot secrets, so cryptsetup stretches both through the keyslot KDF (Argon2id by default for LUKS2). Neither is a raw dm-crypt volume key. The two differ in transport, byte handling, and which slot they occupy – not in whether a KDF runs.

  • Passphrase (slot 0): braid trims a trailing newline and rejects embedded line breaks (cli/src/luks.rs#finalize_passphrase_bytes), then pipes the bytes to cryptsetup via --key-file=- with no --keyfile-size (a passphrase is variable-length). Designed to protect a low-entropy human-chosen secret.

  • Binary keyfile (slot 1): exactly 4096 bytes read via --keyfile-size 4096, with no newline trimming. braid enforces the exact size before handing the path to cryptsetup (cli/src/luks.rs#validate_user_keyfile_path). High entropy, but still a KDF-protected keyslot secret – not a raw key.

The passphrase and the keyfile are never interchangeable – not even byte-for-byte identical inputs – for a fundamental reason: each LUKS keyslot carries its own salt, so slot 0 and slot 1 derive different keys from identical KDF input. Secondarily, at the cryptsetup level the bytes that reach the KDF can also differ: a passphrase file containing hunter2\n feeds hunter2 (the trailing newline is trimmed) while a keyfile of the same bytes feeds hunter2\n verbatim. That byte example is illustrative only – braid’s keyfile is always exactly 4096 random bytes (anything else is rejected by validate_user_keyfile_path), so the literal “same bytes” case never arises in practice. The claim to reject is that one path skips a KDF; both run it.

A genuinely raw dm-crypt volume key would require --volume-key-file, which braid forbids: it is in the MANAGED_LUKS_FORMAT_LONG_FLAGS denylist (cli/src/types.rs), so braid refuses to let it reach luksFormat. The passphrase-vs-keyfile --keyfile-size argv asymmetry is pinned by the block comment above the test cli/src/cmd.rs#cryptsetup_luks_open_omits_keyfile_size.

LUKS2 provides up to 32 keyslots per device; braid uses slot 0 for the passphrase and slot 1 for the keyfile.

See: cryptsetup(8) – key-file processing (the man page’s “passed directly in dm-crypt” / no-digest note is scoped to the plain device type, not LUKS), Arch Wiki – dm-crypt/Device encryption

Keyfile creation target invariant

Any braid command path that creates or overwrites braid.key in a user-supplied directory must verify that directory exists, is a directory, and is an active mount point both at plan time and again immediately before writing braid.key. The plan-time check alone is insufficient: the seconds-long window between planning and the actual write (passphrase prompt, Argon2 --test-passphrase verify against every pool disk, per-disk luksDump slot inventory) lets a USB device be unmounted (manual umount, hot-unplug, systemd-automount idle timeout) after the gate passes, which would otherwise let the keyfile land on the host root filesystem.

This currently applies to braid enroll DIR --generate. Existing-keyfile consumers may read from ordinary admin-controlled paths and must not require a mount point:

  • braid enroll DIR without --generate
  • braid add --enroll DIR
  • braid replace --enroll DIR
  • braid unlock --key-file PATH
  • braid.autoUnlock reading /run/braid-key/mnt/braid.key

Plaintext keyfile exposure (Unraid CVE)

Unraid stores the LUKS passphrase in plaintext at /root/keyfile on persistent storage. This means anyone with root access or physical access to the boot drive can read the encryption passphrase — the encryption is effectively defeated at rest.

See: Unraid forum — LUKS password stored in plaintext at /root/keyfile

Braid avoids this in three ways:

  1. No local storage. The passphrase file lives on a removable USB device, never copied to the host filesystem.
  2. Mount-read-unmount. The auto-unlock service mounts the USB read-only, reads the passphrase, then unmounts immediately. The passphrase is not accessible on the filesystem after unlock completes.
  3. Restricted mount root. The USB is mounted at /run/braid-key/mnt, under a parent directory /run/braid-key that remains 0700 root:root. Non-root users cannot traverse the parent regardless of the USB filesystem’s root inode permissions, so the passphrase file stays unreachable during the mount window.

Credential memory hygiene

Passphrase buffers in the CLI are Zeroizing<...> from read to drop (cli/src/luks.rs::read_line_into_zeroizing, cli/src/luks.rs::read_file_into_zeroizing), and subprocess delivery is stdin-only with no argv argument or temporary file. Generated keyfile bytes are zeroized after write (cli/src/enroll_key_file.rs::generate_key_file). Passphrases and keyfile bytes never enter the Nix store; the upsmon token is generated at runtime per decision 020, and the USB keyfile lives only on the USB stick mounted into /run/braid-key/mnt/ as hardened in commit df706c44875f.

Boot resilience: nofail + device-timeout

The USB mount uses nofail and x-systemd.device-timeout=Ns. Together these guarantee the USB device never blocks boot:

  • nofail: systemd does not treat a failed mount as a boot failure.
  • x-systemd.device-timeout: systemd waits at most N seconds for the block device to appear, then gives up.
  • noauto: the mount is not started at boot; it is triggered on-demand by the automount unit when the auto-unlock service accesses the mount point.

If the USB stick is not plugged in, the automount times out, the auto-unlock service sees no key file, logs an informational message, and exits 0. Boot continues normally; the pool stays locked for manual unlock.

Header backup workflow and messaging

LUKS header backups protect against on-disk header corruption. braid’s add, replace, and enroll_key_file create local .luksheader files at /var/lib/braid/luks-headers/<disk>.luksheader as a transient byproduct – they are not the intended backup target. The product workflow is:

  1. braid writes a local .luksheader during a header-mutating operation.
  2. The user exports the header off-system (USB, second machine, cloud key storage, etc.).
  3. The user removes the local copy. braid status and the TUI warn while a local copy persists, because its continued presence on the same machine defeats the off-system backup model.

Messaging invariant

User-facing recovery, restoration, and backup-status messages – in doctor, status, unlock errors, the TUI, or any new command – must NOT reference local /var/lib/braid/luks-headers/*.luksheader files. Recovery guidance is generic: “restore from your off-system LUKS header backup if you have one.” Specifically:

  • Never branch on whether a local .luksheader file exists.
  • Never call Path::exists on paths.luks_headers_dir().join(...) to change user-visible advice.
  • Never tell users to run cryptsetup luksHeaderRestore --header-backup-file /var/lib/braid/....

If doctor pointed users at the local files, the product would be internally inconsistent: status and the TUI warn about the same artifact doctor would tell users to depend on. Generic guidance is the right answer even if the local backup happens to be present and would technically work.

Red flags when reviewing recovery messaging: /var/lib/braid/luks-headers/, .luksheader, luks_headers_dir(), and any Path::exists against a backup path.

Open-failure header diagnosis

Unlock is two-phase. plan_open_pool probes every declared disk and classifies it (ConfigDiskState); the disks it hands to execute_unlock_and_mount as to_unlock are exactly the ones it found PresentLuks – header intact, both luksUuid and luksDump succeeded at plan time. execute_unlock_and_mount then verifies the credential and opens each disk.

When verify or open fails, open_disks_with_credential re-probes the header at failure time and routes the result through explain_open_failure:

  • Unreadable – emit the off-system-backup guidance (per the messaging invariant above).
  • Ok – the header is intact, so the original cryptsetup/verify error is passed through verbatim (e.g. a genuine wrong passphrase).
  • ProbeFailed – the probe itself could not run, so braid reports that diagnosis is incomplete rather than guessing a cause.

The failure-time re-probe is deliberate, not redundant. Because the to_unlock disks were PresentLuks by construction, the planner holds no header-damage observation to thread in – there is nothing to reuse. The header can still change in the plan->open window (external dd, a hardware fault, a swapped device), and the failure-time probe is exactly what keeps a wiped or damaged header from being misdiagnosed as a “wrong passphrase”.

probe_luks_header -> LuksHeaderState is the single header-damage classifier; ConfigDiskState is a separate, coarse membership gateway, so the two neither duplicate nor drift.

Unparseable state-file reconciliation

There are two state files that can block normal operation when they are unparseable: /var/lib/braid/pool.json and /var/lib/braid/pending-op.json.

For a corrupt or off-schema pool.json, the remediation phrase is:

run 'braid discover --write' to rebuild from existing disks (with all intended pool members attached; see docs/internals/luks-unlock.md)

Confirm the attached disks are the intended pool members, then run braid discover --write – the corrupt file is overwritten in place and the original bytes are preserved at pool.json.corrupt-<RFC3339-UTC> next to it. The snapshot is a hard precondition for the rebuild: if it cannot be written (full disk, read-only state directory), discover --write refuses with failed to snapshot existing corrupt file to ... so the corrupt original is not destroyed; free disk space or fix permissions and retry. The sidecar is safe to remove once you have manually copied any still-relevant prior-binding bytes (e.g. devid for a null_underlying member). If you know the expected member count ahead of time, pass --expect-count <N> to fail closed against a temporarily detached disk or an unrelated braid-labeled disk being silently admitted.

Note: braid lock – the user-facing command, the braid-online.service ExecStop path, and braid lock --dry-run alike – does NOT fail under a missing or corrupt pool.json. It warns and proceeds with empty membership; every observed braid-* mapper is then verified by its backing LUKS UUID before close, so shutdown cleanup stays complete. No lock pathway hard-fails on an unloadable pool.json.

For a healthy UUID-keyed pool.json, do not run discover --write at all – use braid add / braid remove / braid replace to mutate membership. discover --write is a repair tool, not a refresh; running it against a healthy file refuses (is already a healthy UUID-keyed membership) so it does not drop persisted devid bindings (decision 024).

For an unparseable pending-operation journal, the remediation phrase is:

Remove /var/lib/braid/pending-op.json after manual reconciliation (see docs/internals/luks-unlock.md) and re-run.

It is safe to remove pending-op.json only when one of these is true:

  • The operation has not yet committed any disk-level mutation: no LUKS format was applied, no btrfs device add ran, and no fresh-format target was opened.
  • The user has confirmed with braid status that the live pool already reflects the intended state and the journal entry is stale.

It is not safe to remove pending-op.json when a partially completed mutation is in flight, such as mkfs.btrfs succeeding but btrfs device add not yet running, or a replace paused mid-rebuild. In those cases, follow the recovery scenario guide instead.

Replace Target Size Preflight

braid replace mirrors btrfs’s own source-size authority by issuing BTRFS_IOC_DEV_INFO for the source devid and reading total_bytes, the same value btrfs replace start compares against. The ioctl is wrapped behind the BtrfsDevInfo trait so planning code can be unit-tested like the existing Filesystem boundary; production uses LinuxBtrfsDevInfo with nix::ioctl_readwrite!.

Target capacity is computed before opening the replacement mapper. Existing LUKS targets read LUKS2 segment offset and size from cryptsetup luksDump --dump-json-metadata: dynamic segments use raw - offset with no sector_size rounding because cryptsetup sizes the dm-crypt device that way exactly and the kernel rejects, rather than rounds, a non-sector_size-multiple mapper, so an existing container’s capacity is exact at any sector_size. Fixed segments use segment.size directly. Fresh targets instead assume cryptsetup’s default 16 MiB LUKS2 offset, which holds because braid rejects --sector-size and offset-changing format flags. If any of those values cannot be read or parsed, or the computed target capacity is smaller than the source total_bytes, replace refuses before writing pending-op.json, formatting a fresh target, or opening the replacement mapper.

Failed unlock cleanup

If braid unlock or a recovery mount path opens one or more LUKS mappers but fails before mounting the pool, braid fails closed for only the mappers opened by that command invocation.

Cleanup is scoped by the LUKS open helper’s ownership result:

  • Opened: braid created the mapper during this command and may close it on failure.
  • AlreadyOwned: the mapper was already open at execution time, including races where an operator opened it after planning. braid must not close it.

The cleanup sequence is:

  1. If any opened mapper path still exists under /dev/mapper, run scoped btrfs device scan --forget <paths> for those paths. Failure warns and cleanup continues.
  2. Close every opened mapper with the same retry-on-busy behavior as braid lock.

When no mapper was opened, cleanup is a silent no-op: there is no btrfs device scan --forget, no cryptsetup close, and no trailing cleanup summary. This is the expected wrong-passphrase shape.

After attempting non-empty cleanup, stderr includes one trailing summary line:

  • Success: cleanup: closed LUKS mappers opened by this command.
  • Failure: cleanup failed: one or more LUKS mappers opened by this command could not be closed; run 'braid lock' after resolving the issue. First cleanup error: ...

The original unlock or mount error remains the command’s primary error; cleanup output is secondary guidance and never replaces it.

Mount point permissions

Standard guidance for directories containing LUKS key material: the directory should be mode 0700 owned by root, and keyfiles should be mode 0400. Since braid mounts the USB read-only at /run/braid-key/mnt, file permissions are whatever the USB filesystem has – but the locked parent directory /run/braid-key prevents non-root users from traversing to the mounted files.

See: LUKS key file permissions

Device Disappearance States

When a physical drive disappears from a btrfs pool (hot-unplug, cable failure, drive death), the system passes through several states depending on how far the failure has progressed and whether the LUKS mapper is still open. Each state produces different output from btrfs filesystem show, btrfs device stats, and cryptsetup status — and braid must handle each combination correctly.

This mapping is not derivable from reading braid’s code or btrfs docs alone — it requires cross-tool knowledge that’s easy to get wrong.

State Table

Statebtrfs filesystem showbtrfs device statscryptsetup statusbraid maps to
Healthypath /dev/mapper/X[/dev/mapper/X]device: /dev/sdYpool.devices
Null-underlyingpath /dev/mapper/X[/dev/mapper/X]device: (null)pool.null_underlying
MISSING with pathpath /dev/mapper/X MISSING[/dev/mapper/X] (??)not queriedmissing_devids only
Fully gonepath MISSING[devid:N]not queriedmissing_devids

Empirical note: SATA hot-unplug on real hardware enters Null-underlying immediately and stays there for at least 5 minutes without I/O pressure. We have not yet observed the MISSING-with-path state in practice. See real-world/sata-hot-unplug.md for full test results.

Healthy

Normal operation. Physical drive is present, LUKS mapper is open and points to the underlying block device, btrfs sees the device.

Null-underlying

Hot-unplug while mounted. The LUKS mapper (/dev/mapper/braid-X) is still open in device-mapper, but the backing block device has vanished. cryptsetup status reports device: (null). btrfs still sees the mapper path — it doesn’t know the physical drive is gone until I/O fails.

braid handles this correctly: probe_pool detects the (null) device, records it in pool.null_underlying, and monitor includes its devid in alert_missing_devids. The stats row reports both the mapper path and the devid; the alert pipeline pairs by devid directly.

Post-UUID-identity rule: when a mapper is null-underlying, the live LUKS UUID is not observable from the missing backing device. braid may bind that live mapper back to membership through persisted DiskMember.devid, but only for this restricted case. The persisted devid is prior-binding state, not display authority; status output still uses live btrfs stats for displayed devids.

MISSING with path

btrfs has registered the device as missing, but still remembers which mapper path it had. btrfs filesystem show appends MISSING to the path. The parser puts the devid into missing_devids but discards the path. probe_pool never processes this device (it only iterates show.devices), so it doesn’t appear in pool.devices or pool.null_underlying.

Handling: btrfs device stats rows always carry a mandatory devid field, so the alert pipeline identifies the row by devid regardless of which path string btrfs reports ([/dev/mapper/X] or [devid:N]). The MissingDevice alert is generated independently from missing_devids. Rows for alert-local missing devids are skipped for BtrfsDeviceErrors, while braid ack still snapshots their counters by devid so old counts do not re-alert if the member returns.

The same restricted devid fallback applies to membership correlation: when btrfs reports a missing device only by devid, braid can resolve the member whose persisted DiskMember.devid matches. It must not infer membership by parsing a mapper name or LUKS label.

Uncertainty: We haven’t empirically confirmed which path string btrfs device stats reports for a device in this state – the ?? in the table marks this. The answer no longer affects correctness (devid drives the lookup), but it would still be useful empirical data.

Fully gone

Device is completely absent — either the LUKS mapper was torn down, or the device was missing at mount time (degraded mount). btrfs filesystem show reports bare path MISSING (no mapper path). The pinned btrfs-progs renders the missing-device stats path as [devid:N] (cmds/device.c#print_device_stat_string); [<missing disk>] is an older btrfs rendering. braid does not depend on either string: the parser ignores the device field and keeps the row’s devid and counters.

At this point there is no mapper and no observable LUKS UUID. Mutating commands that target the missing device, such as remove-missing and missing-path replace, resolve the requested btrfs devid through UUID-keyed membership and fail closed if no persisted member carries that devid.

Transitions

The typical progression for a hot-unplug:

Healthy → Null-underlying → MISSING with path(?) → Fully gone

The transitions depend on timing, I/O activity, and whether the kernel tears down the LUKS mapper. A brief unplug-replug might only reach Null-underlying before recovering. A permanent removal eventually reaches Fully gone.

The transition from Null-underlying to MISSING with path is the least understood. It likely happens when btrfs attempts I/O on the device and gets errors, then marks it missing — but the mapper path is still in kernel memory so btrfs remembers it.

Code Pointers

  • probe_pool: cli/src/probe.rs – builds pool.devices, pool.null_underlying, pool.missing_devids
  • btrfs filesystem show parser: cli/src/parse/btrfs_filesystem_show.rs – filters MISSING devices from devices list
  • btrfs device stats parser: cli/src/parse/btrfs_device_stats.rs – propagates devid as the btrfs-native stats row key and ignores the display-only device string
  • alert computation: cli/src/alert.rscompute_alert_state and snapshot_current key by dev.devid from the parsed stats row; compute_alert_state skips alert-local missing devids for BtrfsDeviceErrors

smartd alert conditions

Reference for what triggers smartd to call the notification script.

braid’s current smartd config

-a -o on -S on -m <nomailer> -M exec ${smartdAlertScript}

-a expands to: -H -f -t -l error -l selftest -l selfteststs -C 197 -U 198

-o on and -S on are non-monitoring config flags (enable offline testing and attribute autosave on the drive).

Wired in modules/braid/monitor.nix (search for smartdAlertScript).

SATA: conditions that fire the alert script

smartd polls every 30 minutes. Each condition has a SMARTD_FAILTYPE value passed to the script.

SMARTD_FAILTYPEDirectiveTrigger
Health-HOverall SMART health status = FAILING
Usage-fAny Usage (Old_age) attribute value <= vendor threshold
ErrorCount-l errorATA error log count increased since last poll
SelfTest-l selftestNew self-test failures detected
CurrentPendingSector-C 197Non-zero raw value on attr 197
OfflineUncorrectableSector-U 198Non-zero raw value on attr 198
FailedHealthCheck-HSMART health command itself failed
FailedReadSmartDataCould not read SMART attribute data
FailedReadSmartErrorLogCould not read SMART error log
FailedReadSmartSelfTestLogCould not read self-test log
FailedOpenDeviceopen() failed – device disappeared
Temperature-WTemperature >= CRIT threshold (NOT in -a, must be added explicitly)

SATA: what -a does NOT alert on

These are only logged to syslog, not sent to the script:

  • Reallocated_Sector_Ct (5) raw value increases – only alerted if value crosses the vendor threshold (via -f). To alert on raw value changes, add -R 5!.
  • Reported_Uncorrect (187), End-to-End_Error (184), Reallocated_Event_Count (196) – same: threshold breach only via -f, no raw-value alerts.
  • Temperature – not monitored at all without -W DIFF,INFO,CRIT.
  • Prefail/Usage attribute value changes-t (= -p -u) logs these to syslog at LOG_INFO, but does not fire the script.

SATA: syslog-only directives (no script trigger)

DirectiveWhat it monitors
-pPrefail attribute value changes (LOG_INFO)
-uUsage attribute value changes (LOG_INFO)
-tAll attribute changes (= -p -u)
-r IDReport raw value alongside normalized (informational)
-R ID (without !)Track raw value changes (LOG_INFO, no email)
-R ID! (with !)Track raw value changes (LOG_CRIT + fires script)
-l offlinestsOffline Data Collection status changes (LOG_CRIT, no email)
-l selfteststsSelf-Test execution status changes (LOG_CRIT, no email)

NVMe: how -a works differently

NVMe has a standardized health model – no vendor-specific attribute IDs or thresholds. The ATA-only parts of -a (-C 197, -U 198, -o on, -S on) are silently ignored.

NVMe conditions that fire the alert script

SMARTD_FAILTYPEDirectiveTrigger
Health-HCritical Warning byte != 0 (any bit set)
Usage-fPercentage Used > 95% or Media and Data Integrity Errors increased
ErrorCount-l errorError Information Log Entries count increased (device-related errors only, since smartmontools 7.4)
SelfTest-l selftestNew self-test failures (requires smartmontools 7.5+)
FailedHealthCheck-HSMART health command itself failed
FailedReadSmartDataCould not read SMART data
FailedReadSmartErrorLogCould not read error log
FailedReadSmartSelfTestLogCould not read self-test log
FailedOpenDeviceopen() failed – device disappeared

The Critical Warning byte (-H)

A bitmask where any bit set fires the alert:

BitMeaning
0Available spare fallen below threshold
1Temperature above/below acceptable range
2Reliability degraded (excessive writes beyond warranty)
3Media placed in read-only mode
4Volatile memory backup (power-loss protection capacitor) failed

As of smartmontools 7.5, -H MASK (hex) can ignore specific bits, e.g. -H 0xfb ignores bit 2 (reliability/warranty warning).

NVMe syslog-only tracking (no script trigger)

DirectiveWhat it monitors
-pAvailable Spare changes (LOG_INFO)
-uPercentage Used and Media Errors changes (LOG_INFO)
-tAll of the above (= -p -u)
-l selfteststsSelf-test execution status changes (LOG_CRIT, no email)

NVMe vs SATA summary

NVMe monitoring is more straightforward because the spec defines exactly what “unhealthy” means, whereas SATA relies on vendor-specific attribute definitions and generously-set thresholds. The same -a config line works for both – smartd adapts per device type.

SATA attributes worth monitoring

Based on real Seagate SATA output.

Reliable indicators (unambiguous, no vendor-encoding issues)

IDNameNotes
5Reallocated_Sector_CtSectors remapped due to read errors
184End-to-End_ErrorInternal data path integrity failure. Non-zero = serious.
187Reported_UncorrectUncorrectable errors reported to host. Non-zero = data loss occurred.
196Reallocated_Event_CountRemap operations (complements attr 5). Non-zero = active reallocation.
197Current_Pending_SectorSectors waiting to be remapped
198Offline_UncorrectableSectors unreadable during offline test

Useful but with caveats

IDNameNotes
10Spin_Retry_CountFailed spin-up. Non-zero = mechanical trouble.
188Command_TimeoutHigh values = dying drive, but some timeouts normal during power events.

Avoid using raw values for comparison

IDNameWhy
1Raw_Read_Error_RateSeagate packs composite value (errors in lower bits, total ops in upper). Raw number is meaningless for threshold comparison. Other vendors vary too.
7Seek_Error_RateSame Seagate composite encoding.

Not disk errors

IDNameWhy
191G-Sense_Error_RateShock sensor. Low values normal for a moved drive.
193Load_Cycle_CountWear indicator, not an error.
199UDMA_CRC_Error_CountAlmost always a cable/connection problem, not the drive.

Relationship to braid’s live SMART classifier (SmartEvidence)

braid runs its own live SMART probe: parse_smartctl (in cli/src/parse/smartctl.rs) builds a SmartEvidence from smartctl -H -A --json output, reading the raw values of 3 ATA attributes: Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable (plus the NVMe health-information log on NVMe drives). This verdict now feeds both braid status and the TUI – the same per-disk probe surfaces in status output (the smart JSON object and the SMART: text line) and in the TUI disk-detail panel.

This is complementary to smartd, not a replacement: smartd handles real-time alerts (with its own set of checks), while braid’s classifier gives at-a-glance diagnostic status. Critically, the live classifier is diagnostic only – a degraded SmartEvidence never raises an AlertCause. smartd remains the sole SMART alert source (it writes the smartd-alert flag that drives AlertCause::SmartdAlert); see ADR 014 and ADR 030. The two SMART signals don’t need to be identical but should cover the same ground between them.

SATA Hot-Unplug and Replug Behavior

Empirical observations from physical hardware testing. Validates the device state model in tool-behavior/device-disappearance.md.

Hardware

  • Machine: Silverstone NAS (hunk)
  • Drives: 3x SATA HDD in btrfs RAID1 over LUKS
  • Disk removed: ccc (ST500LM021, devid 3, wwn-0x5000c500ba0a8b52, LUKS label braid-ccc)
  • OS: NixOS with braid module

Detection signals and latencies

How fast each layer notices the disk is gone, and what passive signals are available without user-initiated I/O.

SignalLatencyPassive?Programmatic detection
ata*: SATA link down (kernel journal)InstantYesjournalctl -kf pattern match
udev remove event~11s (after SATA retries)Yesudev rule on ACTION=="remove"
/dev/disk/by-id/wwn-* symlink disappears~11s (udev cleans it)Yesinotify on /dev/disk/by-id/
cryptsetup status shows device: (null)~11sYespoll cryptsetup status
btrfs write errors (periodic commit)~26sYesjournalctl -kf pattern match
btrfs device stats shows nonzero errors~26s+Needs querybtrfs device stats

Key takeaway: the kernel journal and udev events are the fastest passive signals. btrfs is completely oblivious until its next periodic commit (~30s default), but then notices on its own without user-initiated I/O.

The udev remove event is especially useful – it includes ID_WWN and ID_FS_LABEL (e.g. braid-ccc), so a udev rule can immediately identify which braid disk disappeared.

What does NOT react

  • LUKS mapper (/dev/mapper/braid-ccc): stays as a zombie. cryptsetup status still says “active” but the backing device: becomes (null). I/O through it fails.
  • btrfs filesystem show: continues to list all 3 devices with paths and sizes even after errors. Never reports the device as missing from this command alone.

udev remove event (raw)

Arrives after the SATA retries complete (~11s). Includes disk identity:

KERNEL[1395.061297] remove   /devices/pci0000:00/0000:00:01.2/0000:02:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sda (block)
ACTION=remove
DEVNAME=/dev/sda
DEVTYPE=disk

UDEV  [1395.091944] remove   /devices/pci0000:00/0000:00:01.2/0000:02:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sda (block)
ACTION=remove
DEVNAME=/dev/sda
ID_WWN=0x5000c500ba0a8b52
ID_FS_LABEL=braid-ccc
ID_FS_TYPE=crypto_LUKS
DEVLINKS=... /dev/disk/by-id/wwn-0x5000c500ba0a8b52 ... /dev/disk/by-label/braid-ccc ...

cryptsetup status (zombie mapper)

After the block device is gone, the LUKS mapper lingers but its backing device is null:

/dev/mapper/braid-ccc is active and is in use.
  type:    n/a
  cipher:  aes-xts-plain64
  device:  (null)
  mode:    read/write

btrfs device stats (after errors)

[/dev/mapper/braid-ccc].write_io_errs    10
[/dev/mapper/braid-ccc].read_io_errs     0
[/dev/mapper/braid-ccc].flush_io_errs    1
[/dev/mapper/braid-ccc].corruption_errs  0
[/dev/mapper/braid-ccc].generation_errs  0

Test: SATA Hot-Unplug (disk removed while pool mounted)

Immediate state (seconds after unplug)

ToolOutput
btrfs filesystem showStill lists path /dev/mapper/braid-ccc — no MISSING suffix
btrfs device statsStill lists [/dev/mapper/braid-ccc] — not <missing disk>
cryptsetup status braid-cccactive and is in use, device: (null)
braid statusDEGRADED, ccc = missing
braid monitorExit 1 (alert), clean MissingDevice { devid: 3 }

Conclusion: Immediate hot-unplug enters the null-underlying state. btrfs doesn’t know the device is gone — it still reports the mapper path. Only cryptsetup detects the loss (underlying block device vanished). braid’s null-underlying detection handles this correctly.

State after ~5 minutes (still unplugged)

No change. btrfs filesystem show still reports the path without MISSING. btrfs doesn’t transition to the MISSING state on its own without I/O pressure. The null-underlying state is stable for at least minutes.

Kernel perspective (dmesg)

[ 3431s] ata1: SATA link down (SStatus 0 SControl 300)
[ 3437s] ata1: SATA link down — limiting SATA link speed
[ 3442s] ata1.00: disable device, detaching (SCSI 0:0:0:0)
[ 3442s] sd 0:0:0:0: [sdc] Synchronize Cache failed: DID_BAD_TARGET

Kernel detects the link-down within seconds and detaches the SCSI device. The LUKS mapper (dm-2) stays open — dm-crypt doesn’t tear down when the underlying device vanishes.

Test: SATA Replug (disk reconnected)

State after replug

ToolOutput
btrfs filesystem showStill lists path /dev/mapper/braid-ccc (unchanged)
btrfs device statsStill lists [/dev/mapper/braid-ccc] (unchanged)
cryptsetup status braid-cccStill device: (null) — does NOT recover
braid statusccc still shows as missing / UNKNOWN
Physical deviceBack as /dev/sde (was /dev/sdc before unplug)

Key finding: The LUKS mapper does not recover from null-underlying after replug. The dm-crypt target was /dev/sdc, but the kernel re-attached the disk as /dev/sde. The mapper is permanently broken until closed and reopened.

Kernel perspective (dmesg)

[ 3744s] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 3744s] ata1.00: ATA-8: ST500LM021-1KJ152
[ 3744s] sd 0:0:0:0: [sde] 976773168 512-byte logical blocks
[ 3744s] sd 0:0:0:0: [sde] Attached SCSI disk

Kernel sees the disk on the same ATA port but assigns a new SCSI device node (sde instead of sdc).

Recovery path

The broken LUKS mapper cannot self-heal. Recovery requires:

  1. braid ack to silence the alert
  2. Reboot → braid unlock (reopens LUKS mappers using stable /dev/disk/by-id/ paths)

This is correct behavior — braid uses by-id paths for LUKS open, so a reboot always rebinds to the right device regardless of kernel device node assignment.

Unanswered Questions

  • MISSING-with-path state: We never observed btrfs filesystem show report path /dev/mapper/X MISSING during these tests. This state may require sustained I/O errors or a degraded mount (reboot with disk missing). The ?? in the device state table for what btrfs device stats reports in this state remains unverified.
  • Time to MISSING transition: btrfs didn’t transition from null-underlying to MISSING within 5 minutes of idle. It may require write pressure or a longer timeout.
  • Replug with same device node: We didn’t test whether cryptsetup recovers if the kernel assigns the same /dev/sdX path after replug. Unlikely in practice since the kernel increments device letters.

Validated Code Paths

Changes to these should prompt re-verification of this document:

  • cli/src/probe.rsprobe_pool() null-underlying detection (lines 190-206)
  • cli/src/monitor.rs – alert-local missing devids union (missing_devids ∪ null_underlying devids)
  • cli/src/alert.rscompute_alert_state / snapshot_current (devid-keyed; no path-to-devid map)
  • cli/src/parse/btrfs_filesystem_show.rs – MISSING device filtering (line 116)
  • cli/src/parse/btrfs_device_stats.rsdevid propagation and <missing disk> / devid:<n> sentinel handling

btrfs balance: profile conversions and block group types

Block group types

btrfs has three block group types, each with an independent RAID profile:

TypeContentsDefault (1 device)Default (2+ devices)
dataFile contentssinglesingle
metadataInodes, directory entries, extent treedupraid1
systemChunk tree (maps virtual → physical addresses)dupraid1

System chunks follow metadata automatically

When converting profiles with btrfs balance start -mconvert=<profile>, system chunks are converted alongside metadata. You do not need to pass -sconvert=<profile> separately.

The -s flag exists for converting system chunks independently of metadata, which requires -f because btrfs considers it dangerous.

This means our standard conversion commands are complete:

# single → RAID1 (after adding 2nd device)
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/storage

# RAID1 → single (before removing last redundant device)
btrfs balance start -dconvert=single -mconvert=dup -f /mnt/storage

Note the asymmetry in the second command: data converts to single, but metadata converts to dup, not single. A one-device pool keeps two same-disk copies of metadata (and system chunks), matching what mkfs.btrfs lays down for a fresh single-device filesystem (see the table above). -mconvert=single would leave metadata with a single unprotected copy. The -f is required here because reducing metadata from RAID1 to dup lowers redundancy – btrfs refuses that without a force flag – not because of the -s independent-conversion case discussed above.

Why system chunks matter

System chunks contain the chunk tree — the structure that maps virtual addresses to physical device locations. Losing the only copy of a system chunk usually means losing the entire filesystem. On a multi-device pool, having system chunks in RAID1 ensures this map survives a single device failure.

Sources

btrfs balance: the soft flag

What soft does

soft is a per-type modifier for convert= filters. From btrfs-progs Documentation/btrfs-balance.rst (version 6.19.1, tag v6.19.1, commit fa79dbea32d39ac0ae41a88a079013c7ad2a8a58): “When doing convert from one profile to another and soft mode is on, chunks that already have the target profile are left untouched.”

btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/storage

Without soft, every block group is rewritten regardless of its current profile. With soft, only block groups whose profile differs from the target are touched. The switch is per-type, so -dconvert and -mconvert apply it independently.

soft keys on the profile tag alone, not on data distribution: a chunk tagged raid1 is skipped even if both copies happen to live on a subset of the devices. That distinction is exactly why braid uses hard convert in one place and soft in another.

Where braid uses hard vs soft

braid issues two different RAID1 convert-balances. The choice of soft is deliberate in each.

Hard convert – growing the pool (braid add, 3rd+ device)

braid add of a 3rd-or-later device runs a HARD -dconvert=raid1 (pool_balance_raid1, emitting BtrfsBalanceRaid1). Soft would be wrong here:

  1. Pool has devices A, B – all chunks are raid1 across A and B.
  2. Add device C.
  3. -dconvert=raid1,soft – every chunk is already raid1, so soft skips them all. Balance is a no-op.
  4. Device C sits empty. Existing data still has zero copies on C.

A hard rewrite rewrites every chunk, redistributing copies across all three devices – which is the whole point of balancing after a device add. (A 1->2 add converts the existing single chunks either way, so the distinction only bites at the 3rd+ device.)

Soft convert – converting leftover single chunks

btrfs allocates a single chunk (one copy) only when it cannot place two copies on two devices – i.e. when a RAID1 pool has fewer than two devices present for allocation. The common case is a 2-disk pool mounted degraded on its one surviving device: new writes land as single. A larger pool that still has two survivors keeps allocating raid1 – a 3-disk pool degraded to two creates no single chunks – so this conversion is only ever needed for chunks written while fewer than two devices were available.

Once the pool is whole again, those single chunks must be converted back to raid1 to restore redundancy. braid runs a SOFT -dconvert=raid1,soft (pool_balance_raid1_soft, emitting BtrfsBalanceRaid1Soft): it converts exactly the single chunks and skips everything already raid1. Because soft skips matching chunks, the balance is idempotent and cheap – a near no-op when there is nothing to convert – so braid runs it as cleanup without first checking whether any single chunks exist.

braid issues this soft balance from two code paths:

  • Live restoremaybe_restore_raid1 (cli/src/pool.rs), invoked by remove-missing and by replace’s missing path once the operation clears the last missing device.
  • Recover replayreplay_owed_raid1_maintenance (cli/src/recover.rs), described below.

replace itself uses btrfs replace start (atomic), not add+balance+remove (see ADR-001), so this soft balance is the only convert-balance in the replace path.

Skip – degraded add (missing member present)

braid add into a pool that still has a missing member runs NO convert balance at all. The post-add present-device count can already be >= 2 (a 2-disk RAID1 with one member missing, plus the fresh disk), which would otherwise trip the hard convert above; braid gates it off on missing_count > 0 and surfaces a single [skip] note instead. The skip is applied symmetrically in cli/src/add.rs: plan_add pushes one PreviewNote::Skip, and the preview step builder (AddWorkPlan::render_steps) and the execute balance gate (AddPlan::execute) both carry the same missing_count == 0 condition so dry-run and real-run agree.

This is a deliberate deferral, not a hazard fix. The hard convert does succeed on a degraded pool today – btrfs device add works on a degraded mount and the convert rewrites every chunk across the present devices – but it rewrites all data through the allocator while the pool has no redundancy, a longer and less-targeted operation than the purpose-built btrfs replace. braid instead defers redundancy restoration to the repair step: remove-missing (which relocates data onto the new disk and runs the soft balance above) or replace. The soft convert, by contrast, is left running even on a degraded pool – it only converts single -> raid1 and never rewrites existing raid1 chunks, so it cannot do a full degraded rewrite and is safe and beneficial there.

Skipping at add also makes the degraded-add interrupt paths converge. With no hard balance issued, a completed degraded add and every recover path end at the same state: device added, pool still degraded, redundancy deferred to the repair step. Before this change the paths diverged: a completed degraded add restored redundancy via the hard balance, but recover could only safely replay owed RAID1 maintenance when no paused balance survived the interruption. Skipping at add closes that divergent path by making degraded-add recovery end in the same deferred-repair state.

btrfs-progs guidance backs the deferral. btrfs-balance.rst (in Sources) recommends you “use :command:btrfs replace or :command:btrfs device remove to handle the failing/missing device first.” We lean on that as general guidance, not a strong prohibition – its acute warning is narrower, about converting to a profile with lower redundancy (RAID1 -> SINGLE) with a present-but-failing device, milder than our convert to raid1 with a cleanly-missing member.

Recover replay

After a forced shutdown mid-mutation, braid recover replays owed RAID1 maintenance only if btrfs balance status reports no active balance:

Warning

Replaying a crash-paused RAID1 balance can underflow btrfs block-group accounting and silently halve redundancy. recover preserves pending-op.json instead of automating recovery when the balance state is paused, running, or unknown.

On any pool with two or more devices, the idle/no-paused path runs the soft balance above to catch single chunks an interrupted balance left behind. The idempotent ,soft filter makes this safe even when nothing needs converting.

This replay fires for an interrupted add when the balance state is idle – the new disk is already in the pool, so re-running braid add would refuse, and recover finishes the job so the operator is not left with single chunks – and for the idle/no-paused owed post-maintenance step of remove-missing and replace.

braid remove is deliberately not part of this replay. It is the only mutation whose pre-mutation phase can issue a balance – the RAID1 -> single conversion in the 2->1 case. A paused balance found while recovering a remove may be that unfinished conversion-to-single, not owed RAID1 maintenance, so recover neither resumes nor soft-replays it. Resuming it would finish converting to single without removing the device, then clear the journal, silently halving redundancy. Recover instead directs the operator to re-run braid remove.

Sources

  • btrfs-progs Documentation/btrfs-balance.rst, version 6.19.1, tag v6.19.1, commit fa79dbea32d39ac0ae41a88a079013c7ad2a8a58soft filter semantics.
  • btrfs-progs Documentation/btrfs-man5.rst, version 6.19.1, tag v6.19.1, commit fa79dbea32d39ac0ae41a88a079013c7ad2a8a58 – degraded mounts and mixed block group profiles.
  • braid: ADR-001 btrfs RAID1 (replacement strategy, add+balance+remove rejected), design principles (degraded restore), and the replace / remove-missing command docs.

ENOSPC vs hang: reproducing btrfs device remove failures in VMs

Background

btrfs device remove missing has two failure modes when surviving devices lack space for relocation. Both are bad, but the second is catastrophic.

Failure mode 1: instant ENOSPC

Conditions: surviving devices have zero (or near-zero) unallocated space.

btrfs can’t even begin relocating block groups. It fails immediately:

ERROR: error removing device 'missing': No space left on device

Filesystem stays healthy and writable. Annoying but recoverable.

How to reproduce in a VM: 3×512MiB disks, fill to 100% capacity, kill one disk. btrfs device remove missing fails in under a second.

Failure mode 2: partial relocation → transaction abort → forced read-only

Conditions: surviving devices have SOME unallocated space (hundreds of MiB) but not enough to relocate ALL block groups from the dead device.

btrfs starts relocating, successfully moves some block groups (consuming the free space), then hits ENOSPC mid-transaction on a subsequent block group. The transaction abort forces the entire filesystem read-only:

BTRFS info: relocating block group 4761583616 flags data|raid1
BTRFS info: found 20 extents, stage: move data extents
BTRFS info: found 20 extents, stage: update data pointers
BTRFS info: relocating block group 3419406336 flags metadata|raid1
BTRFS: Transaction aborted (error -28)
BTRFS: error in __btrfs_free_extent: errno=-28 No space left
BTRFS info: forced readonly

The error reported to the user is “Read-only file system” — the ENOSPC is buried in dmesg. The filesystem is destroyed (forced read-only) and requires remounting or rebooting to recover.

On real hardware with slow USB drives, btrfs doesn’t crash quickly — it spends hours doing I/O, throttled by writeback queuing (wbt_wait), retrying the same block groups before eventually aborting. In a VM with fast virtual disks, the same sequence completes in ~40 seconds.

What makes the difference

The variable is whether btrfs can begin relocating:

Free space on survivorsbtrfs behaviorOutcome
~0Can’t start → instant ENOSPCFilesystem OK
Some but not enoughStarts, partially succeeds, then ENOSPC mid-transactionFilesystem destroyed (forced read-only)
EnoughCompletes relocationSuccess

The dangerous middle case is the one that happened in the real incident (3×8GiB USB drives, ~80% full, one died).

How braid avoids this

braid’s mutation preflight refuses these removals — before the pending-op journal is written — whenever it can prove the survivors lack the space to absorb the target’s allocations. The degraded failure-mode-2 path is fully guarded: remove-missing and the 2→1 eviction are fail-closed, so an operator using braid does not reach the catastrophic path above. The healthy >=2-survivor case is intentionally warn-and-proceed on an unprovable check, because it falls through to btrfs’s clean failure mode 1, never the mode-2 abort. Per path:

  • remove-missing — the degraded failure-mode-2 scenario exactly. Computes RAID1 chunk-pair capacity on the survivors and refuses when it is below the chunks allocated on the missing device. Fail-closed: any probe or parse uncertainty also refuses (cli/src/preflight.rs::check_raid1_relocation_space, wired in cli/src/remove_missing.rs).
  • remove evicting to a single survivor (2→1) — RAID1 no longer applies, so braid instead checks the lone survivor can hold the post-conversion data + 2 × metadata + 2 × system (single + DUP profile). Fail-closed (cli/src/preflight.rs#check_single_survivor_capacity). Enforced at plan time and re-validated as a pre-journal gate in cli/src/remove.rs#RemovePlan::execute, closing the drift window where the pool keeps taking writes while the operator idles at the confirmation prompt — an over-committed survivor is then refused before the irreversible -f balance, still with no pending-op.json stranded.
  • remove with >=2 survivors (healthy) — same RAID1 relocation check, but warn-and-proceed on probe/parse uncertainty. A best-effort miss here falls through to btrfs device remove, which hits the clean failure mode 1 (instant ENOSPC), not the failure-mode-2 abort, so the filesystem stays intact.
  • replace is not subject to this failure mode. btrfs replace rebuilds onto the new disk instead of relocating onto survivors; its preflight refuses a new disk smaller than the one being replaced (cli/src/preflight.rs::check_replace_target_capacity).

braid status and braid doctor surface a proactive advisory (cli/src/capacity.rs::enospc_risk_advisory) one disk-loss before a pool enters this danger zone.

The policy and its rationale are owned by ADR 012’s “ENOSPC pre-flight check” section (docs/design/decisions/012-intent-cli.md). See also docs/commands/remove-missing.md and the braid status ENOSPC advisory (docs/commands/status.md).

Reproducing the hang/crash in a VM

The tricky part is getting btrfs to land in the “some but not enough” zone. Two challenges:

1. btrfs allocates unevenly across devices

Writing 2GiB of data to a 3-device RAID1 pool doesn’t give you ~667MiB allocated per device. btrfs allocates block groups in pairs (for RAID1), and the pair selection isn’t perfectly balanced. In testing:

disk1: Unallocated  1.00MiB    ← nearly full
disk2: Unallocated  288.88MiB  ← some room

With one device at ~0 free, btrfs can’t relocate anything there → instant ENOSPC (failure mode 1). To get failure mode 2, BOTH survivors need meaningful free space.

2. Block group granularity

btrfs allocates space in block groups (256MiB on small devices, 1GiB on large ones). A single dd write of 200MiB might or might not trigger a new block group allocation. Writing in smaller chunks (50MiB) gives btrfs more allocation decisions, improving the chance of even distribution.

Working recipe (what the test does)

  1. Use 4GiB disks — large enough for btrfs to create multiple data block groups per device, giving room for partial relocation.

  2. Adaptive fill with small chunks — write 50MiB at a time, check btrfs device usage --raw after each write, stop when the minimum unallocated across all online devices drops below 800MiB. This targets the sweet spot: both survivors have 300-800MiB free.

  3. Use --raw for parsingbtrfs device usage displays values in human units (MiB, GiB) depending on magnitude. --raw gives bytes, avoiding unit-parsing bugs.

  4. Kill disk3, mount degraded, attempt btrfs device remove missing — btrfs starts relocating, succeeds on one block group (~38s of I/O in VM), then crashes on the next with transaction abort.

What didn’t work

  • 512MiB disks filled to 100%: instant ENOSPC (failure mode 1). No free space for btrfs to even begin.

  • 2GiB disks with 200MiB write chunks: uneven allocation left one survivor with 1MiB free → instant ENOSPC again.

  • 2GiB disks with adaptive fill: same uneven allocation problem. Not enough total capacity for btrfs to distribute block groups evenly across 3 device pairs.

  • Parsing btrfs device usage without --raw: values display as MiB or GiB depending on size. On fresh 4GiB disks, unallocated shows as GiB; a regex matching only MiB found zero values → fill loop stopped immediately.

Test files

  • tests/repro/btrfs-remove-enospc.nix/.py — failure mode 1 (instant ENOSPC, 3×512MiB)
  • tests/repro/btrfs-remove-enospc-crash.nix/.py — failure mode 2 (partial relocation crash, 3×4GiB)

These are repro tests that document actual btrfs behavior, not TDD tests. They assert the real outcomes: instant ENOSPC with surviving filesystem, or transaction abort with forced read-only. They invoke raw btrfs device remove missing rather than braid precisely because braid’s preflight (see “How braid avoids this” above) refuses the operation under these conditions — reproducing the unguarded btrfs behavior requires bypassing it. They live in tests/repro/ — a folder reserved for tests that reproduce real-world scenarios for our records.

LUKS sector size and btrfs

Summary

braid does not pass --sector-size to cryptsetup luksFormat, and it rejects operator attempts to set it. With the flag omitted, cryptsetup auto-detects the encryption sector size from each device – and that auto-detected value is already the optimal one for the device – so braid never chooses a sector size itself.

What auto-detect picks

When --sector-size is omitted, cryptsetup sizes the LUKS2 encryption sector to the device’s physical sector:

The encryption sector size is set based on the underlying data device if not specified explicitly. For native 4096-byte physical sector devices, it is set to 4096 bytes. For 4096/512e (4096-byte physical sector size with 512-byte sector emulation), it is set to 4096 bytes. For drives reporting only a 512-byte physical sector size, it is set to 512 bytes.

– cryptsetup 2.8.6, man/common_options.adoc (LUKSFORMAT branch)

The rule, in short:

  • 4Kn (native 4096) and 512e (4096/512e) drives -> 4096-byte LUKS sectors
  • drives reporting a 512-byte physical sector -> 512-byte LUKS sectors

On our hardware:

  • NAS drives: 8TB+ SATA HDDs (4Kn or 512e) -> 4096-byte LUKS sectors, matching the physical sector.
  • Test drives: USB sticks and VM virtio disks report 512-byte sectors -> 512-byte LUKS sectors. The committed luksDump fixtures (cli/tests/fixtures/nixos-26.05/cryptsetup-luks-dump.json and its nixos-unstable mirror) record "sector_size":512 for exactly this reason: they capture VM disks, not the NAS hardware.

Why braid doesn’t override it

Two reasons:

  1. Auto-detect already yields the optimal value. Setting --sector-size explicitly could at best re-specify what cryptsetup would pick anyway, while adding a format-time parameter that cannot change without re-encrypting the device. There is nothing to gain.

  2. An override could make braid’s capacity estimate unsafe. braid rejects --sector-size passed as a --luks-format-arg override (see cli/src/types.rs#LuksFormatExtraOpts::parse); replace lists it among the on-disk-layout flags it refuses. A non-default sector size can shift the fresh-LUKS payload offset, and braid’s capacity check for a fresh target assumes cryptsetup’s default offset. The scope here is deliberately fresh targets only: the replace-target size preflight covers existing containers, whose capacity is read from the LUKS2 segment and is exact at any sector size.

Aside: even 512-byte LUKS sectors are harmless under btrfs

This section covers the 512-byte LUKS sector case – the test drives above, and the historical worry that motivated --sector-size 4096 in the first place. It does not describe the NAS, which gets 4096-byte LUKS sectors from auto-detect. Even at 512-byte sectors, btrfs sees no read-modify-write penalty.

The three layers

btrfs (always 4096-byte blocks)
  -> LUKS (512 or 4096-byte sectors)
    -> physical disk (512 or 4096-byte sectors)

Why –sector-size 4096 exists

Read-modify-write amplification happens at the physical disk when something writes less than a full physical sector. Example: writing a single 512-byte LUKS sector to a 4096-byte-physical-sector disk forces the disk to read 4096 bytes, modify 512, and write 4096 back.

Why btrfs avoids it even at 512-byte LUKS sectors

btrfs never writes anything smaller than 4096 bytes. Take a 4096-byte btrfs write landing on a LUKS device with 512-byte sectors. dm-crypt encrypts that write as 8 x 512-byte crypto sectors internally – but the internal crypto-sector count is not the I/O count:

dm-crypt does not split the write – it allocates one clone bio for the entire write and submits it downstream as a single bio:

clone = crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size);

– Linux 6.18.33, drivers/md/dm-crypt.c (kcryptd_crypt_write_convert)

The physical disk therefore receives a full 4096-byte write – no read-modify-write penalty. The only overhead is CPU: 8 IV computations and 8 smaller AES operations instead of 1. With AES-NI doing multiple GB/s, that is negligible next to spinning-disk speeds.

When –sector-size 4096 would matter

Filesystems that can issue sub-4096 writes: ext4 with 1K blocks, raw dd, database engines doing 512-byte writes. btrfs is not one of them.

btrfs dev_replace resume-on-mount and the recover relock cycle

Background

A btrfs replace interrupted mid-flight by an unclean crash leaves the on-disk dev_replace_item in STARTED. On the next mount, the kernel sees that state and resumes the replace from the on-disk cursor.

For braid, this matters during braid recover: the command may be the first thing to mount the pool after the crash, so it is also the thing that triggers the kernel’s resume-on-mount path.

Kernel resume-on-mount behavior

btrfs_resume_dev_replace_async runs as a detached kthread. umount does not wait for that worker.

The worker commits the post-completion devid swap to disk correctly, but it does not update the in-memory btrfs_fs_devices for the mount session that triggered the resume. A probe taken from that session reads stale topology: a phantom MISSING devid 0 plus both the source and target devices. In the captured failure, that meant five device entries and braid status reporting DEGRADED even though every disk was online.

Why the LUKS close+reopen is load-bearing

The important empirical result is narrower than “remount after replace”: umount + btrfs device scan --forget + remount is not enough if the dm devices stay alive. That cycle can leave the cached fs_devices attached to the live dm devices, so the next probe still sees the stale topology from the original resume-triggering mount.

Only tearing down and recreating the dm devices forces the kernel to re-read the chunk tree from disk and build a fresh fs_devices that reflects the post-resume on-disk state.

What braid recover does

Recover splits this into two explicit work actions: RecoverWorkAction::WaitForKernelReplace and RecoverWorkAction::RemountCycle.

First, cli/src/recover.rs#wait_for_kernel_replace_to_finish polls btrfs replace status until the kernel reports Finished or no replace is in progress. Running is intentionally unbounded because interrupting the kernel worker would strand the same recovery problem. Suspended or unparseable output fails closed and preserves the journal.

Then cli/src/recover.rs#relock_and_remount runs the full relock cycle: umount, btrfs device scan --forget, close the LUKS membership union, reopen the pool through the standard plan_open_pool flow, and remount through the standard executor. The second mount sees the completed on-disk replace with a fresh fs_devices.

Coverage

tests/repro/btrfs-replace-interrupted-mid-flight.py pins the unclean-kill path end-to-end. It starts a real braid replace, kills the VM mid-flight, boots again, runs braid recover, and asserts that the resumed replace drains, pool.json swaps in the new disk, the old disk is evicted, and a later braid lock; braid unlock cycle stays clean.

Path B: v6.19+ freeze/signal cancellation

The unclean-kill repro does not cover the v6.19+ freeze and signal cancellation path inside the btrfs replace worker loop. An unclean kernel kill bypasses the in-loop try_to_freeze and fatal_signal_pending checks entirely.

A sibling repro test is needed when kernel >= 6.19 reaches NixOS stable. Its sequencing depends on whether braid replace should inhibit suspend for the operation’s duration; that policy question is orthogonal to the crash path this note documents.

See also

← braid

Development

braid is developed test-first with NixOS VM tests. Tests run on macOS via nix.linux-builder.enable = true in nix-darwin (checks are checks.aarch64-darwin).

Dev shell

Enter the pinned braid dev shell before running local commands:

nix develop

The shell includes the Rust toolchain (cargo, rustc, rustfmt, clippy, rust-analyzer), just, and braid’s parser-critical/runtime tools (btrfs-progs, cryptsetup, util-linux, nut).

That shell is Linux-only – it bundles the storage tools (btrfs-progs, cryptsetup, util-linux, nut), which don’t evaluate on darwin – so nix develop resolves only on a Linux host. On macOS, run VM tests through the linux-builder and build the CLI with nix build .#braid-cli-unwrapped (below); nix develop .#docs works on macOS but carries only the docs toolchain (mdbook).

Test workflow

# run one test while iterating
just test-vm braid-add-disk

# run a few specific tests
just test-vm braid-add-disk braid-remove-disk

# verbose VM logs (only when non-verbose output doesn't explain the failure)
just test-vm braid-add-disk -v

# full suite before finishing
just test-vm

# repro tests only
just test-repro

# all tests including repro
just test-all

# rust unit tests
just test-rust

# parser compatibility canary (CLI parsers against live VM tool output)
just test-parsers

Run tests without -v by default. Only add -v to a specific failing test when the output doesn’t explain the failure. Never run just test-vm -v (all tests verbose) – too much output to be useful.

Unstable lane

Early-warning tests against nixos-unstable. Failures signal upcoming changes, not a contract violation.

# VM tests against nixos-unstable
just test-vm braid-add-disk --unstable
just test-all-unstable

# fixture capture + golden tests against unstable
just capture-all-fixtures-unstable
just test-rust-unstable

Faster tests with tmpfs

VM tests create qcow2 disk images that hammer your SSD. Mount a dedicated tmpfs so builds happen in RAM:

# NixOS config
fileSystems."/tmp-braid" = {
  device = "tmpfs";
  fsType = "tmpfs";
  options = [ "size=16G" "mode=0755" ];
};

just test-vm automatically passes --option build-dir /tmp-braid when the mount exists.

Building the CLI

nix build .#braid-cli-unwrapped

The wrapped .#braid and default put btrfs/luks tooling on PATH and are Linux-only; on macOS build the pure-Rust braid-cli-unwrapped. Rust source lives in cli/.

Upgrading dependencies

braid targets the latest stable NixOS release (nixos-26.05) and uses whatever package versions that channel provides – no custom pins or overlays. Versions are locked to a specific nixpkgs commit in flake.lock.

1. Update nixpkgs

nix flake update

2. Check versions

nix eval --raw nixpkgs#btrfs-progs.version
nix eval --raw nixpkgs#systemd.version
nix eval --raw nixpkgs#autosuspend.version
nix eval --raw nixpkgs#cryptsetup.version
nix eval --raw nixpkgs#util-linux.version
nix eval --raw nixpkgs#smartmontools.version

3. Update vendored reference source

reference/ contains upstream source used for code-level reference (parser behavior, output formats, config schemas). Most entries track flake.lock through nixpkgs-pinned tools; nix-crate tracks Cargo.lock. After a flake update, refresh the nixpkgs-pinned entries to match the new versions:

just fetch-references

4. Refresh fixtures and run tests

A nixpkgs bump can change tool output formats, which breaks parsers. Run the full validation sequence:

just capture-all-fixtures
just test-rust
just test-parsers
just test-vm

5. Update vendored crate sources

After any change that touches the nix line in cli/Cargo.toml, or any cargo update-driven bump to the nix package in Cargo.lock, refresh the crate source:

just fetch-references nix-crate

Releasing

Copy-pasteable runbook for cutting a braid release. The design rationale (why the release branch is the channel, why the x86_64-linux build runs in CI, why no-follows is the consumer default, the public-repo trust model) lives in ADR 029.

Prerequisites (one-time)

  • The public Cachix cache braid exists; you have captured its public key (braid.cachix.org-1:...).
  • CACHIX_AUTH_TOKEN (a push token for that cache) is set as a GitHub Actions repo secret.
  • The release branch is not branch-protected against the Actions token – CI fast-forwards it with GITHUB_TOKEN.
  • The releaser can push directly to master. cargo release commits the bump and pushes it to master (not via a PR), so any required-PR ruleset on master must exempt the releaser, or just release fails mid-run after the local commit/tag.
  • Run from nix develop .#release (provides cargo-release, cargo, gh, just on the Mac; the default devShell is Linux-only and has no cargo).

Before releasing

braid does not run the NixOS VM suite in CI, and neither just release nor release.yml requires a VM result. VM coverage is a manual, per-release choice: when a release warrants it, run the suite outside the release automation – either locally:

just test-vm

or by triggering test.yml manually via workflow_dispatch (its only active trigger). Do not re-enable test.yml’s push/pull_request triggers, and do not wire just release to depend on it.

just test-rust (fast, no VM) does gate the release automatically: release.yml re-runs it on the tag, and just release runs a local compile gate (nix build braid-cli-unwrapped) before tagging.

Normal release

From nix develop .#release:

just release <patch|minor|major>

This bumps cli/Cargo.toml + the braid-cli entry in Cargo.lock, commits chore(release): vX.Y.Z, tags vX.Y.Z, and pushes master + the tag. The tag triggers release.yml. Follow CI:

gh run list --workflow release.yml
gh run watch <run-id>

release.yml builds the x86_64-linux binary, pushes it to the braid cache, creates the GitHub release, and – last – fast-forwards the release branch (the consumer channel). Because the FF is last, consumers see the new rev only after the cache is warm and the release object exists.

Pre-1.0 bumps are plain semver:

LevelFromTo
patch0.0.10.0.2
minor0.0.10.1.0
major0.0.11.0.0

So minor jumps to 0.1.0, not 0.0.x – expected, not a surprise.

Consumers upgrade by bumping the lock to the new release tip:

nix flake update braid   # then nixos-rebuild switch

(A consumer may wrap this in a shortcut, e.g. a braid:upgrade shell function.)

One active release tag at a time. release.yml sets queue: max, so a burst of tags all queue (up to 100, FIFO by the time each starts waiting on the concurrency group) and none is dropped. But that order is wait-start time, not dispatch time, so pushing the next tag before the prior release.yml run finishes risks two tags starting out of dispatch order – the older one’s release fast-forward then fails as a non-fast-forward. That outcome is benign for consumers (release only ever moves forward) but shows a red run. So push (or just release) one tag at a time.

Release notes

The GitHub release body is generated by git-cliff from commit subjects, grouped by conventional-commit type (config in cliff.toml). Named types render into stable sections such as Features, Bug Fixes, Documentation, Tests, CI, Build, and Chores; anything unmatched lands in Other. The first release (v0.0.1) is a one-time exception and intentionally has a blank release body; later genuinely empty rendered ranges get a _No notable changes._ placeholder.

Preview the next release’s notes before tagging:

just changelog

(renders commits since the last tag). Before the first v* tag exists, this prints nothing to match the blank v0.0.1 release body. Editing a release body never affects consumers – the release branch fast-forward is what publishes.

The first release

The first release is not special: it is just release patch, the same flow as every later release. The in-tree version is the pre-release 0.0.0, so the first just release patch cuts v0.0.1 (0.0.0 -> 0.0.1); all later runs bump from 0.0.1.

Two first-run-only things happen for free, with no extra steps:

  • release.yml’s final git push origin <commit>:refs/heads/release creates the release branch (the ref does not exist yet, so the first push makes it), and gh release create cuts the first GitHub release (no pre-existing release required). The v0.0.1 release body is intentionally blank instead of a whole-history changelog; git-cliff notes begin with later releases.

Because CI has no VM gate, run the behavioral suite locally before this first cut:

just test-vm
just test-rust

If release CI fails

First rule: never re-run just release after a tag exists – that would bump again.

  • Transient or config-only failure – re-run the existing workflow:

    gh run rerun <run-id>
    gh run watch <run-id>
    
  • Bad tagged code – fix master, then move the same version tag to the fixed commit:

    git push origin :refs/tags/vX.Y.Z
    git tag -d vX.Y.Z
    git tag -a vX.Y.Z -m vX.Y.Z
    git push origin vX.Y.Z
    

Why this is safe: the release fast-forward is the last step, so until it runs release has not advanced and consumers cannot nix flake update to the new rev – a failure at any earlier step (test, build, cache, or gh release create) leaves consumers untouched. Re-running converges: the cache push and gh release create are idempotent, and the FF re-pushes the same commit.

Testing notes

Test conventions and NixOS VM test framework reference for braid. The short three-bullet preamble contract (Intent / Why it exists / Scenario) lives in AGENTS.md at the repo root; everything else – the literal preamble form, the flake.nix registration rule, framework gotchas, and patterns – is here. For the lifecycle test suite see tests/module/systemd-lifecycle.py. For the Rust-level TUI view snapshot tests (insta-based, run via just test-rust), see tui-snapshots.md.

Conventions

Preamble: literal // line-comment form

Every test’s preamble is a contiguous block of // line comments directly above the test item.

  1. Intent — what behavior this test verifies (or tries to verify)
  2. Why it exists — what risk/regression this protects against
  3. Scenario — the real-world user/system story this models, especially the concrete bug or incident that inspired the test
#![allow(unused)]
fn main() {
// Intent: one-line statement of the behavior verified.
// Why it exists: the regression risk this protects against, ideally with
//   reference to the incident or commit that prompted it.
// Scenario: the concrete real-world sequence the test models.
#[test]
fn the_test() { ... }
}

New VM tests must register in flake.nix

just test-vm and just test-all build whatever is registered under checks.<system> in flake.nix – there is no default per-test list in the justfile. When adding a new tests/cli/*.nix or tests/module/*.nix, also add a matching pkgs.testers.nixosTest (import ./tests/cli/<name>.nix { braid = linuxCrane.braid; }) entry to flake.nix. An unregistered test sits in the tree but never runs under nix flake check.

VM-test framework gotchas

just test-repro requires the full repro- prefix

just test-repro <name> and just test-vm <name> pass the test name verbatim to nix as a final attribute selector. The reproChecks flake output is built by filterAttrs keeping the repro- prefix in the filtered set, so the attribute name passed to just test-repro must be exactly the name in flake.nix, prefix and all.

# correct
just test-repro repro-btrfs-replace-interrupted-mid-flight

# wrong -- fails with "flake ... does not provide attribute ... reproChecks.aarch64-darwin.btrfs-replace-interrupted-mid-flight"
just test-repro btrfs-replace-interrupted-mid-flight

The test-vm checks set strips entries with the repro- prefix, so test-vm test names do not have a prefix (e.g. cli-recover-replace-completed).

NixOS test driver wraps every command with set -euo pipefail

The driver auto-prepends set -euo pipefail to every machine.succeed / machine.execute command before sending it to the VM. This is invisible from the test script but has real consequences for chained commands.

Symptom: A chain like ... ; wait $pid_loser ; echo $? > /tmp/exit-a ; ... silently aborts when wait returns non-zero. The exit-code file is never written, and the next subtest assertion fails with cat: /tmp/exit-a: No such file or directory – pointing at the wrong layer.

Idiom for capturing a non-zero exit without aborting:

ec_a=0 ; wait $pid_a || ec_a=$? ; echo $ec_a > /tmp/exit-a

The || consumes the non-zero into the variable, so errexit does not fire. Works for any command whose non-zero exit is expected (wait, grep, diff, etc.). This matters most in concurrent-process tests where one process is expected to exit non-zero (fail-fast lock contention, expected error paths).

Python f-strings without placeholders fail the build-time linter

NixOS VM test scripts are linted at build time. f-strings without {placeholder} variables (e.g. f"Missing foo in config") cause a build failure: f-string is missing placeholders.

In tests/**/*.py, never use f"..." without at least one {variable} inside. Use "literal" + variable for assertion messages that include dynamic values.

Patterns

Regression test quality

Regression tests must fail when the bug is reintroduced. Test the layer where production failed, not a downstream parser or helper that only proves later code works when given correct input.

For error propagation, assert the typed variant and payload. Use exact rendered strings only for tests whose purpose is to lock Display or user-facing output. If a change reclassifies an error, production and tests should call the same mapping helper; do not hand-build the target variant in the test.

For user-visible CLI output or control-flow bugs, prefer a CLI/VM test that drives the real command. If stdout vs stderr matters, capture them separately with >stdout 2>stderr; merged streams do not pin routing. Render or preview helpers that form a user-visible boundary need exact-output coverage for every branch, including no-op branches.

Keep repro tests focused. If adjacent behavior already has dedicated coverage, cite that test instead of bundling another phase into a repro whose failure would become ambiguous.

When a dead test has a name that points at a real user-visible contract, replace it with a real regression test by default. Deleting the dead test turns bad coverage into no coverage.

Live-tool behavior locks

When braid code is changed to depend on a specific external-tool behavior – a particular exit code, a particular output wording, a particular return-value path – mocked unit tests prove the classifier is correct given the assumed behavior, but they do NOT prove the tool still behaves that way. A nixpkgs bump that changed cryptsetup’s exit-code contract would silently misclassify in production while every mocked test still passed.

Whenever a plan introduces a classifier of the form exit_code == <N> or stderr.contains("<wording>") against an external tool, identify (or add) a live-tool repro/VM test that asserts the same code/wording directly. List that test in the plan’s verification section as a required gate. If the live-tool test would be non-trivial to add, pause and reconsider whether the classifier is actually robust.

This is the same family as braid’s parser-compatibility lanes (just test-parsers, just test-rust-unstable, see parser compatibility) – those lock the parser against tool-output drift; a behavior-lock test locks an exit-code or wording classifier against the same drift surface. Reference example: tests/repro/cryptsetup-close-mounted.py asserts exit_code == 5 for busy-close and exit_code == 4 for already-closed, behavior-locking the assumption that cli/src/lock.rs retry classifier depends on.

VM and command test design

Before inventing VM setup for missing disks, degraded mounts, ENOSPC, hotplug, or similar storage state, search tests/cli/, tests/repro/, and tests/hw/ for an existing pattern and reuse it where it fits.

Before proposing a VM test for a mutating command, search the same area for existing notes that say a shape is infeasible, and read sibling tests to learn which seams already exist.

For ordering invariants like “persist state before post-operation maintenance”, prefer a deterministic command-layer failure-injection test: allow the persistence step to succeed, force the next maintenance step to fail, then assert the persisted state is current and the journal still exists.

When code touches kernel async workers, mount-session caches, or device-layer teardown, mocked unit tests are not enough. Run the relevant VM or repro test, inspect full logs when it fails, and repeat timing-sensitive repros enough to rule out a lucky pass.

For cmd_* boolean gates derived from multiple inputs, route both branches through the same injected seam and test the matrix cells that distinguish the intended gate from plausible wrong gates.

For one-off sequenced or stateful command-test behavior, prefer a file-local runner or wrapper over widening the shared MockRunner. Reserve shared runner API changes for behavior that many tests need.

When removing sleep wall-time from tests, inject a sleeper dependency. Do not use #[cfg(test)] to zero a production timing constant whose value is part of the behavior.

Eval-time test isolation: disable, don’t stub

When an eval-time test (lib.evalModules in isolation) breaks because of a new NixOS option dependency, disable the unrelated feature in the test config rather than expanding the fake module surface with stubs.

Stubbing options (e.g. adding options.users) makes the test less isolated and can mask future accidental dependencies on unrelated NixOS top-level options. Disabling the feature that introduced the dependency keeps the test focused.

When fixing eval-time test failures caused by new module dependencies, first check if the dependency comes from a feature the test doesn’t need. If so, set that feature’s config to its “off” value (e.g. poolAccessGroup = null) instead of adding option stubs.

Parser compatibility

braid parses output from btrfs-progs, cryptsetup, util-linux, smartmontools, NUT, and ethtool. These parsers can break when tool versions change. Two validation lanes exist:

Stable lane (pinned contract)

  • just test-parsers — CLI parser canary. Exercises CLI-reachable parsers against live tool output in VMs (including braid-status-ups, the NUT canary).
  • just test-rust — validates golden fixtures for the full parser set, including parse_upsc. Fixture-backed coverage stays current only after running just capture-all-fixtures when parser-critical tool versions change (e.g. nixpkgs bump).
  • Fixture refresh is a separate obligation: just test-parsers passing does not guarantee TUI-only parsers (parse_lsblk_json, parse_cryptsetup_luks_dump) or unused parsers (parse_btrfs_scrub_status_per_device) are compatible with the current toolchain.
  • parse_smartctl (the SMART health parser) is reachable from both the TUI and the braid status CLI command, so it is no longer TUI-only. It is still not covered by the live VM canary, though: virtio disks emit no usable SMART, so just test-parsers cannot exercise it. Its drift canary is the stable-only smartctl golden fixture (see the smartctl-fixtures note below).
  • Fixtures in cli/tests/fixtures/nixos-26.05/ are committed and authoritative. NUT fixtures live in cli/tests/fixtures/nixos-26.05/upsc/ (and the unstable mirror); they are produced by just capture-ups-fixtures, which boots a dedicated NUT VM with per-state dummy-ups drivers (see tests/capture-ups-fixtures.nix).
  • smartctl fixtures are stable-only by design. VM virtio disks do not emit useful SMART data, so just capture-all-fixtures does not regenerate smartctl-sata-with-temperature.json or smartctl-selftest-*.json. smartctl-sata-with-temperature.json is a one-time physical-drive capture; smartctl-selftest-*.json fixtures are hand-authored (see cli/tests/fixtures/nixos-26.05/README.md). The tool-versions VM test checks that smartctl resolves to a /nix/store/ path on the VM’s PATH and that its self-reported version matches pkgs.smartmontools.version, but it does not detect nixpkgs version bumps because both sides advance together. On any nixpkgs bump that touches smartmontools, manually review and refresh smartctl-selftest-*.json against the new ata_smart_self_test_log.standard JSON shape and smartctl-sata-with-temperature.json against the new health/temperature JSON shape (smart_status, temperature, ata_smart_attributes).
  • ethtool WoL fixtures are hand-authored / no-live-capture. VM virtio NICs do not emit useful Wake-on-LAN data, so just capture-all-fixtures does not regenerate ethtool output. The doctor wake_on_lan parser is covered by hand-authored Rust unit fixtures, and wrapper provenance is covered by the override-based VM tests in tool-versions and braid-auto-suspend.

Parser-critical tool versions are the pinned nixpkgs versions of btrfs-progs, cryptsetup, util-linux, nut, smartmontools, and ethtool. Treat any change to the nixpkgs node in flake.lock, any flake.nix change that alters the nixpkgs input, or any change to braid.packages.{btrfsProgs,cryptsetup,utilLinux,nut,smartmontools,ethtool} as a required fixture-refresh event.

When parser-critical tool versions change, run:

  1. just capture-all-fixtures
  2. just test-rust
  3. just test-parsers

Unstable lane (tracked forecast)

Early-warning lane for upstream parser/output drift. Unstable failures signal upcoming changes, not a contract violation. Fixtures in cli/tests/fixtures/nixos-unstable/ are committed so upstream output changes are visible in git history, but they are non-authoritative.

  • just test-all-unstable – VM tests against nixos-unstable. Covers CLI-reachable parsers against live tool output but does not cover the full parser surface (TUI-only parsers, unused parsers, smartctl).
  • just capture-all-fixtures-unstable + just test-rust-unstable – covers btrfs/cryptsetup/util-linux/NUT against unstable tool output via golden fixtures. Missing fixtures fail (not skip).
  • smartctl and ethtool have no unstable fixtures. Unstable capture/test coverage intentionally covers btrfs/cryptsetup/util-linux/NUT only; see the Stable lane for why smartctl fixtures are stable-only and how to refresh them on smartmontools bumps, and why ethtool WoL output is hand-authored instead of live-captured.

Full unstable canary workflow:

  1. just test-all-unstable
  2. just capture-all-fixtures-unstable
  3. just test-rust-unstable

Reference source

Before searching the web for tool behavior, consult local resources first. reference/ contains shallow clones of upstream repos at the versions pinned in nixpkgs, plus Rust crate sources pinned in Cargo.lock. Refresh with just fetch-references.

When to look: Any time you’re implementing, modifying, or debugging code that interacts with these tools — especially parsers. Read the relevant source before making assumptions about output format or behavior.

  • btrfs-progskdave/btrfs-progs
    • Source: reference/btrfs-progs/cmds/ — one file per subcommand (e.g. cmds/scrub.c). Parser output formats, exit codes.
    • Docs: reference/btrfs-progs/Documentation/ — RST. See btrfs docs below for the topic table.
  • systemdsystemd/systemd
    • Source: reference/systemd/src/ — unit lifecycle internals, systemd-ask-password, mount/automount.
    • Docs: reference/systemd/docs/ — markdown design docs (BOOT.md, INHIBITOR_LOCKS.md, MOUNT_REQUIREMENTS.md, CREDENTIALS.md, PASSWORD_AGENTS.md, etc.). reference/systemd/man/ — XML man-page sources for unit/option reference (systemd.service.xml, systemd.mount.xml, …).
  • autosuspendlanguitar/autosuspend
    • Source: reference/autosuspend/src/ — check classes, config schema, wakeup scheduling.
    • Docs: reference/autosuspend/doc/source/ — RST (available_checks.rst, available_wakeups.rst, configuration_file.rst, systemd_integration.rst).
  • cryptsetupcryptsetup/cryptsetup
    • Source: reference/cryptsetup/src/ (CLI), reference/cryptsetup/lib/ (libcryptsetup) — luksDump output, LUKS2 header structure, keyslot operations.
    • Docs: reference/cryptsetup/man/*.8.adoc man pages (cryptsetup-luksDump.8.adoc, cryptsetup-open.8.adoc, …). reference/cryptsetup/docs/ — design notes including LUKS2-locking.txt and on-disk-format-luks2.pdf.
  • util-linuxutil-linux/util-linux
    • Source: reference/util-linux/misc-utils/ (lsblk, blkid), reference/util-linux/sys-utils/ (mount, umount), reference/util-linux/libmount/, reference/util-linux/libblkid/lsblk JSON schema, blkid output, mount/unmount behavior.
    • Docs: Man pages live next to source as *.8.adoc (e.g. misc-utils/lsblk.8.adoc, sys-utils/mount.8.adoc). reference/util-linux/Documentation/ is project meta (build/test/contribution notes), not user reference.
  • smartmontoolssmartmontools/smartmontools
    • Source: reference/smartmontools/smartmontools/ — flat layout. smartctl output format, SMART attribute definitions, exit codes.
    • Docs: No separate docs dir. Man-page sources are inline alongside the code: smartctl.8.in, smartd.8.in, smartd.conf.5.in.
  • hddfancontroldesbma/hddfancontrol
    • Source: reference/hddfancontrol/src/ — Rust daemon. device/ (drivetemp, hddtemp, smartctl probing), probe/ (pwm-test ramp logic), fan.rs (PWM control), pwm.rs (sysfs PWM I/O), cl.rs (CLI args).
    • Docs: No separate docs dir. reference/hddfancontrol/README.md and reference/hddfancontrol/systemd/hddfancontrol.service — the upstream unit we intentionally don’t use (see modules/braid/fan-control.nix).
  • nutnetworkupstools/nut
    • Source: reference/nut/clients/ (upsmon.c – shutdown-on-LB daemon, upsc.c – status query, upscmd.c, upssched.c, upsrw.c), reference/nut/server/ (upsd.c and net protocol handlers), reference/nut/drivers/ (usbhid-ups.c and per-vendor *-hid.c for the USB HID path v1 targets).
    • Config schema: reference/nut/conf/ — sample files (nut.conf.sample, ups.conf.sample, upsd.conf.sample, upsd.users.sample, upsmon.conf.sample.in, upssched.conf.sample.in). Authoritative for fields braid generates into /etc/nut/*.
    • Docs: reference/nut/docs/man/*.txt asciidoc man pages for daemons, drivers, and config files. reference/nut/docs/ — design notes (design.txt, net-protocol.txt, developer-guide.txt, new-drivers.txt, FAQ.txt).
  • linuxtorvalds/linux
    • Source: reference/linux/ — kernel source at the exact version pinned in nixpkgs. Look in fs/btrfs/ for btrfs-specific I/O scheduling, raid handling, and read balancing logic. drivers/md/ for raid and block layer behavior.
    • Use for: Understanding kernel-level I/O behavior, raid1 read balancing, mount semantics, block device management.
  • coreutilscoreutils/coreutils (GitHub mirror of GNU Coreutils)
    • Source: reference/coreutils/src/ — one C file per utility (e.g. src/timeout.c, src/realpath.c, src/stat.c, src/chmod.c, src/chown.c, src/head.c, src/base64.c). Read these to confirm what each helper actually guarantees – e.g. timeout(1) exit-code semantics and signal forwarding live in src/timeout.c, not in any manpage.
    • Docs: reference/coreutils/doc/coreutils.texi — the canonical reference manual (per-utility sections inside one big Texinfo file). Per-utility manpage stubs live in reference/coreutils/man/ as *.x (e.g. man/timeout.x); these are short prologues that get merged with --help output by help2man at build time, so the full prose is in coreutils.texi.
    • Use for: Any time braid code or a plan reasons about a Coreutils helper’s behavior beyond the obvious — exit codes, signal handling, race windows, --help text, edge cases. Especially timeout(1): timeout cannot bound an uninterruptible kernel wait, and the proof is in src/timeout.c’s use of kill() against a userspace child.
  • nix (Rust crate)nix-rust/nix
    • Source: reference/nix-crate/src/ – Rust crate at the version pinned in Cargo.lock, not flake.lock. unistd.rs (User/Group/chown/exec helpers, fd ownership types), fcntl.rs (open, flock, OFlag), errno.rs (Errno), sys/stat.rs (Mode), sys/signal.rs (sigaction, signal handlers), sys/termios.rs (termios constants, terminal flags).
    • Docs: No separate docs dir – rustdoc is inline as /// doc comments on each item. reference/nix-crate/Cargo.toml declares the feature gates (braid currently enables fs, user, term, and signal); consult it before reaching for a nix API to confirm which feature it lives under.
    • Use for: Touching any nix:: API, checking feature gates, understanding fd-ownership types, signal-safe helpers, or termios constants. Refresh after any change to the nix line in cli/Cargo.toml or any cargo update-driven bump in Cargo.lock.

btrfs docs

  • Docs: reference/btrfs-progs/Documentation/ — RST docs from btrfs-progs. Start with index.rst for a full table of contents, or use the topic table below for common lookups. Glob by keyword for anything not in the table. ch-* fragments are inlined by just fetch-references.
TopicFile(s)
Adding/removing devicesVolume-management.rst, btrfs-device.rst
Device replacementbtrfs-replace.rst
RebalancingBalance.rst, btrfs-balance.rst
RAID profiles (RAID1 etc.)mkfs.btrfs.rst (search for “profiles”)
Mount optionsbtrfs-man5.rst
Scrub / self-healingScrub.rst
Filesystem limits & storage modelbtrfs-man5.rst
Administration overviewAdministration.rst

Citing reference/ code

braid’s own tracked files are cited by path#symbol or path#heading-slug (doc and ADR file references); external upstream code is different. It lives in reference/, which is gitignored and refreshed wholesale by just fetch-references: it is absent on a clean checkout and invisible to CI. A line number into it drifts on every refresh, and a braid-style path#symbol is not greppable when the file is not on disk – neither form validates or even resolves. Cite external upstream code by its shape:

  • Short, behavior-defining snippet – one line or small function emitting a format, token, or exit code braid parses. Inline the excerpt as frozen ground truth, so a reader sees the contract without fetching reference/. Stamp it pkg <version>, <path> (fn name) and drop the line number. Fence the excerpt with a non-rust language tag – c for source, text for tool output – so rustdoc does not run it as a doctest. An unannotated or rust-tagged block becomes a failing doctest, caught by cargo test -p braid-cli --doc (not just test-rust, whose --lib --bin --test selectors skip doctests). Precedent: cli/src/parse/cryptsetup_luks_version.rs#parse_cryptsetup_luks_version. An inline code span (`printf(...)`) is fine for a tight function or field doc where a fenced block is too heavy. The pkg <version> stamp is the upstream release tag (git -C reference/<pkg> describe --tags); it pins the excerpt and is the re-verify trigger when a nixpkgs bump changes that tool’s version – the same parser-compatibility refresh event that recaptures fixtures.
  • Region or multi-line – a code area with no single quotable line (a long function, a struct, two scattered lines). Keep a pointer, not a wall of inlined code: pkg <version>, <path> (fn name) plus a one-line paraphrase of what’s there. Prefer a function name over a line number; a bare line range is a last resort.

Existing bare-line-number reference/ citations are tolerated – nothing validates them either way – but migrate them toward the excerpt or pointer form when you next touch the surrounding file.

TUI Snapshot Testing with Ratatui + Insta

Rendering for snapshots

Each TUI view module’s #[cfg(test)] block defines a small render that draws the view into a TestBackend, then asserts via the shared snap! helper. render is per-module (it calls that module’s own view function); buffer_to_string and the snap! macro are shared from cli/src/tui/test_support.rs.

#![allow(unused)]
fn main() {
use crate::tui::test_support::{buffer_to_string, snap};

// Per-module: calls this view's draw fn with a fixed `now` for determinism.
fn render(model: &Model, width: u16, height: u16) -> Terminal<TestBackend> {
    let now = time::macros::datetime!(2026-02-24 02:12:00);
    let mut terminal = Terminal::new(TestBackend::new(width, height)).unwrap();
    terminal.draw(|frame| view(model, frame, now)).unwrap();
    terminal
}

#[test]
fn snapshot_with_pool() {
    let model = Model::new_demo(sample_disk_names(), PoolStatus::Mounted(sample_pool()));
    snap!(buffer_to_string(&render(&model, 60, 24)));
}
}

snap! wraps insta::assert_snapshot! in insta::with_settings!({ prepend_module_to_snapshot => false }, ...). That setting defaults to true; we force it off so snapshot files are named after the test alone (snapshot_with_pool.snap), not braid_cli__tui__view__tests__snapshot_with_pool.snap. Always go through snap! – a bare insta::assert_snapshot! would reintroduce the prefix and write to a different filename.

insta could snapshot the TestBackend directly (it implements Display), but buffer_to_string trims trailing whitespace per line for cleaner diffs, so all view tests assert on its String. Styles/colors are not captured – text only.

The cargo insta workflow

  1. cargo test — runs tests normally. New/changed snapshots fail and produce .snap.new files alongside the existing .snap files.
  2. cargo insta review — interactive TUI that walks through each pending change with diffs. Keys: a accept, r reject, s skip.
  3. cargo insta accept — bulk-accepts all pending .snap.new files without review.

Shortcut: cargo insta test --review runs tests then immediately opens the review TUI.

Typical cycle

# Write or change a test → run tests
just test-rust

# Tests fail because snapshot is new/different → .snap.new files appear
# Review the diffs interactively
cargo insta review

# Or if you trust the output, bulk accept
cargo insta accept

# Commit the .snap files

For first-time snapshots (no .snap file yet), cargo test will always fail — run cargo insta review or cargo insta accept to create the initial .snap.

What ratatui recommends

  • TestBackend + insta for integration-level view tests (what we do)
  • Buffer::empty() + direct render for unit-testing individual widgets in isolation, asserting on buffer contents without a full terminal
  • Consistent terminal dimensions (e.g., 80x20) for reproducible snapshots

Planning and review hygiene

  • Re-read the central files immediately before writing or reviewing a plan; do not rely on earlier conversation reads when code may have changed.
  • For renames, refactors, and callsite sweeps, derive the inventory from tracked files with git ls-files plus rg. Be explicit about exclusions and rerun the same search as verification.
  • Before planning recovery or cleanup recipes, verify every step against the current cmd_* / plan_* code and the relevant tool or kernel behavior. Treat issue recipes as hypotheses until the code proves them.
  • Architecture docs describe behavioral contracts, not internal helper names. Verify wrapper process/lifetime claims from the wrapper code before writing docs that depend on them.
  • For external-tool exit-code or wording classifiers, trace the specific subcommand return path in reference/; a shared errno table is not enough to prove one invocation’s behavior.

Mutation safety heuristics

These elaborate principle 3, safe-by-construction operations for contributors.

  • Query the authoritative source of state directly; do not pre-gate it with a cheaper but weaker observable such as path existence.
  • Put invariant checks at the layer that owns the invariant. Primitive-level checks belong inside the helper that performs the unsafe operation; caller policy gates belong at callsites.
  • Keep diagnostic refinements out of mutating-command state enums when the new distinction only matters for status, doctor, TUI, or error rendering.
  • Set fail-closed policy from the downstream failure mode. If a branch can corrupt state or strand a journal when a preflight is wrong, every uncertainty in that branch is a hard error even if a sibling branch can warn and proceed.
  • Residual invariant checks must be hard errors in all builds; do not replace a production guard with debug_assert!.
  • Split post-commit failure variants by the operator’s remediation and on-disk consequence, not by implementation layer.

Doc and ADR file references

In ADRs, decision docs, and docs/ prose, never reference another file by line number. Line numbers drift the moment surrounding code or text is edited, so the pointer silently goes stale and misleads the next reader. Use a path#anchor reference instead – one shape for both code and docs, where the anchor names what and the path says where:

  • Codepath#symbol as a plain code span, not a link: (see `cli/src/cmd/unlock.rs#cmd_unlock`). The symbol is a fn, struct, enum, trait, impl, module, or const, method-qualified where it helps (cli/src/cmd/plan.rs#Planner::plan). The symbol is the drift-proof, greppable half – one rg cmd_unlock finds both the citation and the definition. Never write cli/src/cmd/unlock.rs:142, and do not linkify code paths: cli/ lives outside the mdBook root, so a link 404s in the rendered book and dodges linkcheck. A bare file path (no #symbol) is fine when the whole file is the referent.
  • Markdown / mdBookpath#heading-slug as a real Markdown link, e.g. [...](docs/internals/luks-unlock.md#header-backup-workflow-and-messaging), not a line number or section count. Unlike code refs these are clickable and validated by mdbook-linkcheck2, so a renamed heading fails CI instead of rotting silently.

A symbol or heading anchor survives edits and is greppable; a line number is neither. This applies to docs and comments – transient analysis in plans/wip/ is exempt.

This rule governs braid’s own tracked files; external upstream code under reference/ is gitignored and cited differently (by shape, not path#symbol) – see Citing reference/ code.

Decision-doc references

A decision doc with status: Superseded or Deprecated is a point-in-time record. Do not rewrite its body or ## See section to track current code – the > Superseded by ... banner is the forward pointer to live artifacts. Repointing a frozen doc’s references at today’s successor code only makes it contradict its own narrative.

Independent of status, a ## See bullet whose path no longer resolves is a broken pointer, not history. Drop it; or, if the removed file has lasting reference value (an archived design doc or plan – not deleted dead code), replace the bare path with the git-history-note form used in 002 and 003: (preserved in git history; last present at commit <hash>). The ## See path half of this rule is enforced by scripts/docs/check-see-paths.py.

Rust doc comments

When adding a new top-level function, type, module, trait, or pub/pub(crate) item in the Rust CLI, add a /// doc comment justifying why it exists at that boundary. Capture intent, invariant, ownership, or call-site coupling – not the signature.

Prefer one to three lines. If removing the comment would not lose any information a reader could not recover from the code, do not write it.

Skip:

  • Trait impls whose purpose is the trait (Display, Debug, From, Default, …)
  • Enum variants already covered by an enum-level doc
  • #[cfg(test)] items and test fixtures

Good:

  • “Shared mapper ownership classifier so planner and executor use the same LUKS UUID invariant.”
  • “Separate from MountState because we observe LUKS state without holding the pool lock.”

Bad:

  • “Returns mapper ownership.” (restates signature)
  • “Helper used by the planner.” (vague)
  • “Caller must ensure path is canonical.” (fabricated invariant nothing enforces)

Rust CLI only. Nix module options use NixOS option description fields; shell scripts and Python tests follow their own conventions (see testing.md).