braid
braid is a NixOS CLI tool for managing an encrypted btrfs RAID1 NAS. These docs cover end-user workflows, command reference, design decisions, internals, and development practices.
Common tasks
- First time setup – Getting started
- Add, remove, or replace a disk – add, remove, replace
- USB auto-unlock – Auto-unlock
- Set up disk health alerts – Monitoring and alerts
- Suspend when idle, wake on demand – Power management
Guides
| Guide | Description |
|---|---|
| Install NixOS | Install NixOS itself before setting up braid |
| Getting started | First-time setup: find disks, create pool, unlock |
| Day-to-day NAS usage | Subvolumes, file permissions, Samba shares |
| Auto-unlock | USB keyfile setup for unattended reboots |
| Monitoring and alerts | Disk health alerts, beeper, alert commands |
| Power management | Auto-suspend, Wake-on-LAN, RTC wakeups |
| Fan control | HDD-driven chassis fan control, SATA hotswap |
| UPS | NUT-backed orderly poweroff, preflight safety, live status |
| NixOS configuration | Module options, scrub scheduling, pinned toolchain |
| Sharing and permissions | Storage group, mount permissions, Samba |
| Mounting subvolumes | Expose a btrfs subvolume at a custom path |
| Troubleshooting | ENOSPC balance, paused balance, missing devices |
| Recovery scenarios | Interrupted operations, lost pool.json, degraded mount |
Commands
Commands marked 🧪 are experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
| Command | Description |
|---|---|
| add | Add disks to the pool or create a new pool |
| remove | Remove a live disk from the pool |
| remove-missing | Forget a dead or missing device entry |
| replace | Replace a live or dead disk |
| unlock | Open LUKS devices and mount the pool |
| lock | Unmount the pool and close LUKS devices |
| seal-mountpoint 🧪 | Seal the offline mountpoint immutable (boot-managed) |
| idle 🧪 | Check if the pool is idle for auto-suspend |
| status | Pool health, disk status, allocation, scrub info |
| doctor | Diagnostic checks for config and pool health |
| monitor 🧪 | Health check for alerting used by systemd timer |
| ack 🧪 | Acknowledge and silence an active alert |
| enroll 🧪 | Enroll a USB keyfile for auto-unlock |
| discover 🧪 | Scan for braid LUKS devices and rebuild pool.json |
| recover 🧪 | Recover from an interrupted operation |
| tui | Interactive dashboard with raw-output Browse tab |
| ups status 🧪 | Live UPS state from NUT, with JSON for scripts |
Design
| Doc | Purpose |
|---|---|
| Principles | Authoritative invariants for braid behavior |
| Decision records | Rationale, history, and rejected alternatives |
Internals
| Doc | Purpose |
|---|---|
| LUKS unlock | Unlock, header backup, and recovery-message contract |
| Device disappearance | External-tool output for missing device states |
| SATA hot-unplug | Real hardware observations for hot-unplug behavior |
| btrfs notes | btrfs RAID profile, balance, ENOSPC, and LUKS notes |
Development
| Doc | Purpose |
|---|---|
| Overview | Development workflow and dependency updates |
| Testing | VM test conventions and framework gotchas |
| TUI snapshots | Ratatui and Insta snapshot review workflow |
Install NixOS
You can follow NixOS’ own guide here:
I’ll document the process and the post-install setup, mostly for my own notes.
Download NixOS image
- Go to https://nixos.org/download/#nix-install-linux
- Scroll down to ISO image section
- Download the Graphical 64-bit Intel/AMD image
This guide uses the graphical installer’s wizard for partitioning, swap, and user creation. The Minimal ISO is out of scope here – if you prefer it, follow NixOS’ install guide instead.
Format USB stick with NixOS image
- Download Etcher
- Plug in USB stick
- Use Etcher to write your downloaded ISO image to your USB stick
Install NixOS on NAS computer
- Plug in USB stick and boot from it
- Choose “Install NixOS (Linux LTS)” – this launches the graphical installer
- Click through the wizard. It handles partitioning, swap, and user creation for you.
- Reboot when done; unplug the USB stick so it doesn’t boot from it again
Post-install
Enable SSH
The graphical installer doesn’t enable SSH by default. Log in physically on the NAS console with your user, then add openssh:
sudo nano /etc/nixos/configuration.nix
# add: services.openssh.enable = true;
sudo nixos-rebuild switch
Find the NAS’s LAN IP and SSH in from your laptop:
# On NAS
ip a # look for LAN ip address
# On your laptop
ssh [email protected]
# Once logged in on NAS, change your password
passwd
The rest of this guide takes place over SSH from your laptop.
Install vim
We’ll add more packages later. For now I just want vim on the system to make the rest of the setup easier.
sudo nano /etc/nixos/configuration.nix
environment.systemPackages = with pkgs; [ vim ];
sudo nixos-rebuild switch
Make git repo for NixOS config
The beauty of nix is that your OS is configured by git-diffable config files.
Instead of editing /etc/nixos/*.nix files, I like to have a ~/world git repo that tracks the NAS’s nix config and push it to danneu/world.
I’ll name my NAS “nasbox” here.
~/world/
├── flake.nix
└── hosts/
└── nasbox/ # NAS (NixOS)
├── configuration.nix # System config (boot, networking, services)
├── hardware-configuration.nix
└── home.nix # User config (packages, shell, git, etc.)
Let’s stub out that folder tree:
mkdir -p ~/world/hosts/nasbox
We use home-manager to manage user-level config (packages, git, shell, etc.) separately from the system config. This keeps configuration.nix lean — just boot, networking, and services — while home.nix handles everything specific to your user.
In ~/world/flake.nix:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
home-manager.url = "github:nix-community/home-manager/release-26.05";
home-manager.inputs.nixpkgs.follows = "nixpkgs";
};
outputs = { nixpkgs, home-manager, ... }: {
nixosConfigurations.nasbox = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
./hosts/nasbox/configuration.nix
home-manager.nixosModules.home-manager
{
home-manager.useGlobalPkgs = true;
home-manager.useUserPackages = true;
home-manager.users.dan = import ./hosts/nasbox/home.nix;
}
];
};
};
}
Copy the generated NixOS config into your world repo:
cp /etc/nixos/configuration.nix ~/world/hosts/nasbox/
cp /etc/nixos/hardware-configuration.nix ~/world/hosts/nasbox/
Make sure hosts/nasbox/configuration.nix imports the hardware config with a relative path:
imports = [ ./hardware-configuration.nix ];
Create hosts/nasbox/home.nix for your user-level config:
{ pkgs, ... }:
{
home.username = "dan";
home.homeDirectory = "/home/dan";
home.stateVersion = "26.05";
programs.home-manager.enable = true;
home.sessionVariables = {
EDITOR = "vim";
VISUAL = "vim";
};
programs.git = {
enable = true;
userName = "Your Name";
userEmail = "[email protected]";
extraConfig = {
init.defaultBranch = "master";
pull.rebase = true;
push.autoSetupRemote = true;
};
};
home.packages = with pkgs; [
lazygit # Terminal UI for git
ripgrep # Fast recursive grep (rg)
fd # Fast find alternative
jq # JSON processor
htop # Interactive process viewer
];
}
Now rebuild from the flake instead of /etc/nixos:
sudo nixos-rebuild switch --flake ~/world#nasbox
From now on, you edit ~/world/ as your normal user and only sudo for the rebuild. System-level config goes in configuration.nix, user-level config goes in home.nix.
Set up git and push to GitHub
Generate an SSH key on the NAS and add it to GitHub so you can push/pull:
ssh-keygen -t ed25519 -C "nasbox"
cat ~/.ssh/id_ed25519.pub
Copy the public key and add it at GitHub > Settings > SSH and GPG keys > New SSH key.
Then init and push:
cd ~/world
git init
git add -A
git commit -m "initial nixos config"
git remote add origin [email protected]:danneu/world.git
git push -u origin master
Set hostname and pin the IP
Edit ~/world/hosts/nasbox/configuration.nix:
networking.hostName = "nasbox";
sudo nixos-rebuild switch --flake ~/world#nasbox
For a stable IP, the simplest approach is a DHCP reservation on your router: look up the NAS’s MAC address (ip link show <iface>) and tell the router to always hand it the same address. The reservation lives on the router, not the host – no nix changes and no nixos-rebuild needed. Bonus: it survives interface renames.
If you’d rather pin it on the host, add to configuration.nix:
networking.interfaces.eno1.ipv4.addresses = [{
address = "192.168.1.158";
prefixLength = 24;
}];
networking.defaultGateway = "192.168.1.1";
networking.nameservers = [ "1.1.1.1" "8.8.8.8" ];
Then rebuild:
sudo nixos-rebuild switch --flake ~/world#nasbox
Set up SSH key auth
On your laptop, copy your public key to the NAS:
ssh-copy-id [email protected]
Now you can SSH in without a password. Optionally disable password auth in configuration.nix:
services.openssh = {
enable = true;
settings.PasswordAuthentication = false;
};
Set up Claude Code and Codex
numtide/llm-agents.nix is a daily-updated nix flake that packages 40+ AI coding agents, including Claude Code and OpenAI’s Codex CLI. It exposes them via an overlay under pkgs.llm-agents.*.
Add it as a flake input in ~/world/flake.nix and apply its overlay:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
home-manager.url = "github:nix-community/home-manager/release-26.05";
home-manager.inputs.nixpkgs.follows = "nixpkgs";
# No `follows = "nixpkgs"` -- llm-agents is built against nixpkgs-unstable.
llm-agents.url = "github:numtide/llm-agents.nix";
};
outputs = { nixpkgs, home-manager, llm-agents, ... }: {
nixosConfigurations.nasbox = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
./hosts/nasbox/configuration.nix
{ nixpkgs.overlays = [ llm-agents.overlays.default ]; }
home-manager.nixosModules.home-manager
{
home-manager.useGlobalPkgs = true;
home-manager.useUserPackages = true;
home-manager.users.dan = import ./hosts/nasbox/home.nix;
}
];
};
};
}
First, allow unfree packages in hosts/nasbox/configuration.nix:
nixpkgs.config.allowUnfree = true;
Then add both binaries to the existing home.packages list in hosts/nasbox/home.nix:
home.packages = with pkgs; [
lazygit
ripgrep
fd
jq
htop
] ++ (with pkgs.llm-agents; [
claude-code
codex
]);
Rebuild:
sudo nixos-rebuild switch --flake ~/world#nasbox
Now you can run claude and codex from anywhere on the NAS.
Optional: use the numtide binary cache
By default, source-built agents like codex will compile locally on first install. To pull prebuilt binaries instead, add the numtide cache to your system config (in hosts/nasbox/configuration.nix):
nix.settings = {
extra-substituters = [ "https://cache.numtide.com" ];
extra-trusted-public-keys = [
"niks3.numtide.com-1:DTx8wZduET09hRmMtKdQDxNNthLQETkc/yaX7M4qK0g="
];
};
Then rebuild. Subsequent installs will fetch from the cache.
Next steps
At this point you have a working NixOS machine with SSH access, a stable IP, Claude Code + Codex, and a git-tracked config. Next: add braid to your NixOS config.
Getting started
This guide walks you through first-time braid setup: installing the NixOS module, finding your disks, creating a pool, and unlocking it.
Read this if you have a fresh NixOS machine with empty drives and want to set up an encrypted RAID1 NAS.
What braid manages
braid owns two things:
- LUKS encryption – each drive is individually encrypted with a shared passphrase. Keys are never stored on disk.
- btrfs RAID1 – your encrypted drives form a single filesystem with automatic redundancy and self-healing checksums.
The NixOS module provides the systemd units, mount point, and toolchain. The CLI owns which disks are in the pool – adding or removing a drive is a braid command, not a nixos-rebuild.
Pool membership lives in /var/lib/braid/pool.json. This file is created by braid add and read by braid unlock. It is keyed by each member’s LUKS UUID; the disk name is stored inside each entry for commands and display.
Install the NixOS module
Add braid to your flake inputs and import the module:
# flake.nix
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
braid.url = "github:danneu/braid?ref=release";
};
outputs = { nixpkgs, braid, ... }: {
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
braid.nixosModules.default
./configuration.nix
];
};
};
}
?ref=release follows braid’s release channel; nix flake update braid upgrades
to the newest release. The snippet also keeps braid on its own pinned nixpkgs (no
follows override) on purpose, so braid matches its binary cache (next section).
Minimal configuration:
# configuration.nix
braid = {
enable = true;
mountPoint = "/mnt/storage"; # default
};
Only braid.enable = true is required. nixosModules.default defaults
braid.package to braid’s pinned braid-cli-unwrapped; mountPoint defaults
to /mnt/storage.
Binary cache
braid publishes prebuilt binaries to a public Cachix cache. Add it before rebuilding so the NAS pulls the CLI instead of compiling Rust:
# configuration.nix
nix.settings = {
extra-substituters = [ "https://braid.cachix.org" ];
extra-trusted-public-keys = [ "braid.cachix.org-1:I/p7fx1z5n0+O80KzMuT7aXRdkVyHr/buZKaBu7HvJs=" ];
};
This relies on the no-follows input above – the cache only matches braid’s
pinned nixpkgs. See NixOS configuration.
Rebuild and switch:
sudo nixos-rebuild switch
Find your disks
Use lsblk to identify the drives you want to add:
lsblk -d -o NAME,SIZE,MODEL,ID-LINK
Example output:
NAME SIZE MODEL ID-LINK
sda 12T TOSHIBA MN07ACA12T ata-TOSHIBA_MN07ACA12T_XXXX
sdb 12T TOSHIBA MN07ACA12T ata-TOSHIBA_MN07ACA12T_YYYY
sdc 12T TOSHIBA MN07ACA12T ata-TOSHIBA_MN07ACA12T_ZZZZ
sdd 500G Samsung SSD 860 ata-Samsung_SSD_860_AAAA # boot drive -- leave this alone
You need the ID-LINK values. braid always uses /dev/disk/by-id/ paths – never /dev/sdX, which can change between reboots.
Create the pool
Add your drives with braid add. Each drive gets a short name you choose and a by-id path:
sudo braid add \
toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_XXXX \
toshiba2=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_YYYY \
toshiba3=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_ZZZZ
braid will:
- Ask you to set a LUKS passphrase (used for all drives).
- Format each drive with LUKS encryption.
- Create a btrfs RAID1 filesystem across all drives.
- Mount the pool at
/mnt/storage. - Write pool membership to
/var/lib/braid/pool.json.
All drives join the same btrfs RAID1 filesystem. btrfs RAID1 keeps exactly 2 copies of every block regardless of how many drives you add, so the pool tolerates a single drive failure – a 3-drive pool tolerates the same single failure as a 2-drive pool, with more usable capacity. See Day-to-day usage for what additional drives buy you and how to add them later.
The disk names (toshiba1, toshiba2, etc.) are permanent presentation labels used in all future commands. Pick something short and meaningful. braid uses the LUKS UUID, not the name or LUKS label, as the persistent disk identity.
After braid add completes, the pool is online and mounted. You can start using it immediately:
ls /mnt/storage/
cp photos/* /mnt/storage/photos/
Unlock after reboot
When the NAS reboots, the pool is offline – LUKS drives are closed and nothing is mounted. This is by design: your data stays encrypted until you explicitly unlock it.
SSH into the NAS and unlock:
ssh user@nas
sudo braid unlock
braid prompts for your LUKS passphrase, opens all drives, assembles the btrfs pool, and mounts it.
Check pool health
sudo braid status
This shows:
- Pool state (online/offline)
- Each disk’s health and LUKS status
- Disk allocation and free space
- Scrub status
- Any active alerts
Lock the pool
When you want to take the pool offline (unmount and close LUKS):
sudo braid lock
This is optional – the pool locks automatically on shutdown. Manual locking is useful before maintenance or if you want to ensure the drives are encrypted at rest while the machine stays on.
What’s next
- Day-to-day NAS usage – the reboot/unlock/use cycle, subvolumes, good habits
- Auto-unlock – USB keyfile for unattended reboots
- Monitoring and alerts – get notified when a disk has problems
- Power management – auto-suspend and Wake-on-LAN
Related commands
Day-to-day NAS usage
This guide covers normal operation: the reboot cycle, checking on your pool, adding disks over time, organizing your data with subvolumes, and good operator habits.
Read this after you have completed Getting started and have a working pool.
The daily cycle
A typical NAS session looks like this:
NAS powers on
-> boots to login (pool offline, drives encrypted)
-> SSH in
-> sudo braid unlock (enter passphrase)
-> pool online at /mnt/storage
-> use it (copy files, stream media, backups)
-> pool locks automatically on shutdown
If you have auto-unlock configured with a USB keyfile, the unlock step happens automatically at boot.
Unlock
ssh user@nas
sudo braid unlock
braid prompts for your LUKS passphrase, opens all drives, and mounts the pool.
Use the pool
The pool is a normal directory at /mnt/storage (or whatever you set braid.mountPoint to). Copy files, create directories, share via Samba/NFS – it works like any other filesystem.
cp ~/photos/* /mnt/storage/photos/
rsync -av ~/documents/ /mnt/storage/documents/
Lock
sudo braid lock
This unmounts the pool and closes all LUKS devices. The pool locks automatically on shutdown, so manual locking is only needed if you want drives encrypted while the machine stays on.
Checking pool health
Run braid status periodically to check on your pool:
sudo braid status
This shows pool state, disk health, allocation, scrub history, and any active alerts. Make a habit of glancing at this after unlocking, especially if the NAS has been running unattended.
For an interactive view:
sudo braid tui
The TUI dashboard shows pool health, disk status, balance progress, and SMART data in a live-updating terminal interface.
Organizing data with subvolumes
btrfs subvolumes are the right way to organize different categories of data on your NAS. Think of them as lightweight partitions within the pool.
Subvolumes vs directories
A plain directory works fine for storing files, but subvolumes give you:
- Independent snapshots – snapshot your documents without snapshotting your movie library.
- Per-subvolume quotas – limit how much space a category can use (optional).
- Selective backup – send/receive individual subvolumes to an external drive.
- No cost upfront – subvolumes are free to create. They share the pool’s space with no pre-allocated size.
There is no downside to creating subvolumes early. If you later decide you do not need snapshots for a category, the subvolume still works exactly like a directory.
Creating subvolumes
Create subvolumes for your major data categories:
sudo btrfs subvolume create /mnt/storage/documents
sudo btrfs subvolume create /mnt/storage/photos
sudo btrfs subvolume create /mnt/storage/movies
sudo btrfs subvolume create /mnt/storage/music
sudo btrfs subvolume create /mnt/storage/backups
Then use them like normal directories:
cp ~/report.pdf /mnt/storage/documents/
rsync -av ~/Photos/ /mnt/storage/photos/
Snapshots
Snapshot a subvolume to create a point-in-time copy:
sudo btrfs subvolume snapshot -r /mnt/storage/documents /mnt/storage/.snapshots/documents-2026-04-09
The -r flag makes it read-only, which is best practice for backup snapshots. Snapshots are nearly instant and use no extra space until the original data changes. Deleting a file from the original subvolume does not reclaim its blocks while any snapshot still references them. To free that space, delete the snapshots holding the data with sudo btrfs subvolume delete /mnt/storage/.snapshots/<name>.
Listing subvolumes
sudo btrfs subvolume list /mnt/storage
To mount a specific subvolume at a custom path, for example for a service or a
friendlier path under /home, see Mounting subvolumes.
Adding disks over time
You can add new drives to an existing pool without rebuilding or reformatting:
# Find the new drive
lsblk -d -o NAME,SIZE,MODEL,ID-LINK
# Add it
sudo braid add newdisk=/dev/disk/by-id/ata-NEWDISK_SERIAL
braid formats the new drive with LUKS (using your existing passphrase), adds it to the btrfs pool, and rebalances data across all drives. No nixos-rebuild required.
The balance runs in the foreground – braid add holds the terminal and does not return until it finishes, which can take hours on a large pool. braid shows live balance progress while it runs.
btrfs RAID1 keeps exactly 2 copies of every block no matter how many drives the pool has. A 3rd or 4th drive gives you more usable capacity, but it does not increase fault tolerance – the pool still tolerates a single drive failure, the same as a 2-drive pool. See Decision 001 for the rationale.
Responding to alerts
If the NAS beeps (or sends you an alert via a custom command), something needs attention:
-
SSH in and check status:
sudo braid status -
The status output shows what triggered the alert: btrfs device errors, a missing disk, or a SMART warning.
-
Investigate and fix the issue (replace a failing disk, check cables, etc.).
-
Once resolved, acknowledge the alert to silence it:
sudo braid ack
See Monitoring and alerts for details on how alerts work.
Good operator habits
- Check
braid statusafter unlocking – a quick glance catches problems early. - Keep LUKS header backups – braid stores header backups in
/var/lib/braid/luks-headers/after operations that modify LUKS headers. Copy each.luksheaderfile off the NAS to a separate location, then delete the local file (braid statuswarns until they are removed). If a drive’s LUKS header is corrupted and you have no off-system backup, the data on that drive is unrecoverable. - Run
braid doctor– periodically check for configuration problems:sudo braid doctor - Let scrubs complete – braid runs monthly scrubs by default. Scrubs verify every block’s checksum and repair corruption from redundant copies. braid starts them at low CPU priority (
Nice=19) and idle I/O priority (IOSchedulingClass=idle). The CPU priority always applies; the I/O priority is best-effort – how strongly the kernel honors it depends on your block-layer I/O scheduler – so do not treat it as a guarantee that scrubs will never affect interactive workloads. The pool stays online throughout. If scrubs noticeably impact Samba, NFS, or local use on your hardware, retime them withbraid.autoScrub.interval(any systemd calendar expression – e.g."Sun *-*-* 02:00:00") to land in an off-peak window. Do not interrupt a scrub in progress. - Create subvolumes early – there is no cost to creating them upfront, and you cannot convert a directory to a subvolume later without copying the data.
What’s next
- Auto-unlock – skip the manual passphrase step on boot
- Mounting subvolumes – expose a subvolume at a custom path
- Monitoring and alerts – automatic health checking and notifications
- Power management – suspend when idle, wake on demand
Related commands
Auto-unlock
This guide covers setting up unattended unlock with a USB keyfile so the pool comes online automatically at boot.
Read this if you want the NAS to unlock without SSH-ing in to type a passphrase – for example, after a power outage or scheduled reboot.
How it works
By default, braid requires a passphrase to unlock. Auto-unlock adds a binary keyfile stored on a USB drive as a second LUKS unlock method. At boot, braid mounts the USB, reads the keyfile, unlocks all drives, and unmounts the USB.
The passphrase (LUKS slot 0) still works for manual unlock. The keyfile lives in LUKS slot 1.
Boot behavior
With USB key present: NAS boots, braid-auto-unlock.service runs, mounts USB, unlocks pool, unmounts USB. Pool is online by the time you can SSH in.
Without USB key present: The service waits for the USB device (up to timeoutSec, default 5 seconds), then skips gracefully. The pool stays locked. You SSH in and sudo braid unlock with your passphrase as usual.
This means removing the USB key is all it takes to go back to manual unlock.
Step 1: Generate and enroll the keyfile
Plug in a USB drive and find its by-id path:
lsblk -d -o NAME,SIZE,MODEL,ID-LINK
Mount it somewhere temporary:
sudo mount /dev/disk/by-id/usb-SanDisk_Cruzer_XXXX-0:0-part1 /mnt/usb
Generate a random keyfile and enroll it into all pool disks:
sudo braid enroll /mnt/usb --generate
--generate requires /mnt/usb to already be mounted. This creates a 4096-byte random file at /mnt/usb/braid.key and enrolls it into LUKS slot 1 on every disk in the pool. braid asks for your existing passphrase to authorize the enrollment.
Unmount the USB:
sudo umount /mnt/usb
Enroll during braid add
If you are adding a new disk and already have a keyfile on USB, you can enroll in one step:
sudo braid add newdisk=/dev/disk/by-id/ata-NEWDISK_SERIAL --enroll /mnt/usb
The --enroll flag points to the directory containing braid.key. The new disk gets both the passphrase and the keyfile.
Step 2: Enable auto-unlock in NixOS config
Find your USB key’s by-id path (use the raw device, not a partition, if your USB has no partition table):
lsblk -d -o NAME,SIZE,MODEL,ID-LINK
Add to your NixOS configuration:
# configuration.nix
braid = {
enable = true;
autoUnlock = {
enable = true;
keyDevice = "/dev/disk/by-id/usb-SanDisk_Cruzer_XXXX-0:0-part1";
};
};
Rebuild:
sudo nixos-rebuild switch
Configuration options
| Option | Default | Description |
|---|---|---|
autoUnlock.enable | false | Enable USB keyfile auto-unlock |
autoUnlock.keyDevice | – | Block device for the USB key (must use /dev/disk/by-id/ path) |
autoUnlock.timeoutSec | 5 | Seconds to wait for USB device before giving up |
autoUnlock.allowDegraded | false | Mount even if some drives are missing (degraded mode) |
keyDevice must be a /dev/disk/by-id/ path. braid rejects /dev/sdX paths because they can change between reboots.
Degraded mode
By default, auto-unlock refuses to mount if any pool drive is missing. This prevents silent operation with zero redundancy.
If you want the pool to come online even with a missing drive (for example, if a drive has failed and you plan to replace it), set:
autoUnlock.allowDegraded = true;
Redundancy is reduced until the drive is replaced and data rebalances.
Step 3: Test it
Reboot the NAS with the USB key plugged in:
sudo reboot
After boot, SSH in and check:
sudo braid status
The pool should be online. Check the journal to confirm auto-unlock ran:
journalctl -u braid-auto-unlock.service
Then test without the USB key: remove it, reboot, and confirm the pool stays locked until you manually unlock.
Security considerations
The keyfile on the USB drive can unlock your pool without a passphrase. Treat it like a physical key:
- Remove the USB after boot. The auto-unlock service unmounts the USB immediately after reading the keyfile, but physically removing it ensures no one can copy the key from a running system.
- Store the USB securely. If someone has physical access to both the USB key and the NAS drives, they can decrypt your data.
- Keep a backup of the keyfile. If you lose the USB key, you still have your passphrase. But if you want another auto-unlock USB, you need to
braid enrollagain.
LUKS header backups
After enrolling a keyfile, braid modifies the LUKS header on each drive (adding slot 1). braid stores LUKS header backups in /var/lib/braid/luks-headers/ as a transient byproduct.
Copy each .luksheader file to a separate location (external drive, another machine), then delete the local file. braid status warns until the local copies are removed. If a drive’s LUKS header is corrupted, the off-system backup is the only way to recover access to that drive’s data.
What’s next
- Monitoring and alerts – get notified when a disk has problems
- Power management – auto-suspend and Wake-on-LAN
Related commands
Monitoring and alerts
This guide covers how braid monitors disk health and notifies you when something goes wrong.
Read this if you want to understand the alert system, configure notifications, or respond to an alert.
How monitoring works
braid runs a health check every 5 minutes via a systemd timer. The check looks at three things:
- btrfs device stats – non-zero error counters (read, write, flush, corruption, generation errors) on any drive.
- Missing devices – a drive that should be in the pool but is not present.
- SMART alerts – smartd detected a SMART health warning on a drive.
A scrub that discovers unrepairable read, checksum, or generation errors increments the same btrfs device stats, so it follows the same beep and braid status flow as an everyday I/O error.
If any check triggers, braid activates an alert.
What happens on alert
When braid monitor detects an issue (exit code 1), the systemd wrapper starts braid-alert.service, which:
- Beeps the PC speaker (if enabled) until acknowledged. The cadence starts at 5 seconds and backs off exponentially (5s, 10s, 20s, 40s, …) up to once every 15 minutes, so the early beeps are urgent but an ignored alert doesn’t stay obnoxious.
- Runs your custom alert command (if configured).
The beeping is intentionally persistent and annoying – you should not be able to ignore a disk problem on a NAS that holds your data.
Alerts are latched
An alert stays active until you acknowledge it with braid ack, even if the triggering condition goes away. This is by design: “a disk had errors” is worth investigating even if the error count stopped growing.
Configuration
Monitoring is on by default when braid.enable = true. Here is the full set of options:
braid = {
enable = true;
monitor = {
enable = true; # default: true
interval = "5min"; # default: "5min" (systemd time span)
beep = true; # default: true (PC speaker alert)
alertCommand = null; # default: null (optional custom command)
};
};
Options
| Option | Default | Description |
|---|---|---|
monitor.enable | true | Enable disk health monitoring |
monitor.interval | "5min" | How often to check (systemd time span: "5min", "30s", "1h") |
monitor.beep | true | Beep the PC speaker on alert |
monitor.alertCommand | null | Custom command to run on alert (in addition to beep) |
Custom alert commands
Set monitor.alertCommand to run a script when an alert fires. This runs in addition to (not instead of) the beep:
braid.monitor.alertCommand = "/home/user/scripts/send-pushover-alert.sh";
The command runs as root. It should be idempotent – it may fire on every monitor cycle while the alert is active.
Disabling the beep
If you do not have a PC speaker or prefer silent alerts:
braid.monitor.beep = false;
You probably want to set a custom alertCommand if you disable the beep, otherwise alerts are silent and only visible in braid status.
SMART integration
braid automatically configures smartd to monitor all drives. When smartd detects a SMART health issue, it writes a flag file that braid’s monitor picks up on the next cycle.
You do not need to configure smartd yourself – braid sets it up with sensible defaults. The NixOS services.smartd options are still available if you need to customize behavior.
Alert workflow
When the NAS beeps (or your alert command fires):
1. SSH in and check status
ssh user@nas
sudo braid status
The output shows a banner when alerts are active and lists the causes:
- BtrfsDeviceErrors – a specific drive has non-zero error counters. Could be a bad cable, a dying drive, or a transient issue.
- MissingDevice – a drive is missing from the pool. Check if a cable came loose or if the drive failed.
- SmartdAlert – SMART reports a health warning. The drive may be failing.
2. Investigate
For device errors, check if they are growing:
# Wait a few minutes and check again
sudo braid status
Steady error counts after a reboot are often transient (power event, cable issue). Growing counts mean the drive is failing.
For a missing device, check physical connections. If the drive is dead, plan a replacement:
sudo braid replace --old deadname --new newname=/dev/disk/by-id/ata-NEW_SERIAL
3. Acknowledge
Once you have investigated and resolved (or accepted) the issue:
sudo braid ack
This silences the beep and resets the alert baseline. New errors after ack will trigger a fresh alert.
Checking monitor status
View the monitor service logs:
journalctl -u braid-monitor.service --since "1 hour ago"
View the alert service:
journalctl -u braid-alert.service
Check if the monitor timer is active:
systemctl status braid-monitor.timer
How the pieces fit together
braid-monitor.timer (every 5 min)
-> braid-monitor.service
-> braid monitor (exit 0 = ok/offline/lock-contended, 1 = alert, 2 = setup error)
-> on exit 1: start braid-alert.service
-> beep (PC speaker, 5s -> 10s -> ... -> 15min)
-> alertCommand (if configured)
smartd (always running)
-> detects SMART issue
-> writes /var/lib/braid/smartd-alert flag
-> starts braid-alert.service
-> next braid monitor cycle picks up the flag
braid ack
-> clears alert state
-> braid-alert.service stops (beeping stops)
What’s next
- Power management – auto-suspend and Wake-on-LAN
- Day-to-day NAS usage – good operator habits
Related commands
Power management
This guide covers auto-suspend, Wake-on-LAN (WoL), and troubleshooting the hardware and software chain that makes it all work.
Read this if you want your NAS to sleep when idle and wake on demand.
How auto-suspend works
When enabled, braid uses autosuspend to suspend the entire NAS to RAM when idle. This stops all drives, the CPU, and fans – the machine draws almost no power. Wake it with a Wake-on-LAN magic packet from any device on the network.
Suspend-to-RAM preserves LUKS keys and the mounted btrfs pool in memory. When the NAS wakes, the pool is immediately available – no re-unlock needed.
What counts as activity
The NAS stays awake while any of these are true:
| Check | What it detects |
|---|---|
| braid idle | scrub plus any btrfs kernel exclusive operation (balance, device add/remove/replace, resize, swap activate) – the latter via /sys/fs/btrfs/<fsid>/exclusive_operation |
| braid wol-ready | configured wired NIC currently reports Wake-on: g; if WoL is disabled or unverifiable, auto-suspend is blocked |
| SSH | Active SSH connections (port 22) |
| Local sessions | TTY, X11, or Wayland sessions (via logind) |
| Samba | Active SMB clients (auto-detected, only if Samba is enabled) |
| NFS | Active NFS connections on port 2049 (auto-detected, only if NFS server is enabled) |
If all checks pass (everything idle) for the configured idle time (default 15 minutes), the NAS suspends.
The WoL check gates braid’s auto-suspend path only. Manual sudo systemctl suspend remains available for maintenance and testing, but it bypasses braid’s pre-suspend WoL check.
Scrub wakeups
The monthly btrfs scrub timer is registered as an autosuspend wakeup source. If the NAS is asleep when a scrub is due, it wakes via RTC alarm, runs the scrub, and suspends again when idle.
Configuration
braid = {
enable = true;
autoSuspend = {
enable = true;
idleTime = 900; # seconds before suspend (default: 900 = 15 min)
wolInterface = "eno1"; # required -- your wired ethernet interface
};
};
Options
| Option | Default | Description |
|---|---|---|
autoSuspend.enable | false | Enable auto-suspend when idle |
autoSuspend.idleTime | 900 | Seconds of idle time before suspending |
autoSuspend.wolInterface | – | Wired ethernet interface for Wake-on-LAN (required) |
wolInterface is mandatory. Without WoL, a suspended NAS is unreachable until someone presses the power button. Find your interface with:
ip link
Look for your wired ethernet interface (usually eno1, enp1s0, or similar). WiFi interfaces (wl*) are rejected – WoL requires wired ethernet.
Hardware compatibility
Recommended: Intel NICs
Intel ethernet controllers (e.g., X540, I210, I225) have reliable WoL support with the in-kernel ixgbe and igc drivers. These are the lowest-risk choice for a NAS that needs reliable remote wakeup.
Avoid: Aquantia/Marvell AQC107
The AQC107 (atlantic driver) has known WoL reliability issues on Linux. If WoL is important to you, avoid this chipset.
RTL8125 (Realtek 2.5GbE)
The Realtek RTL8125 works for WoL but requires the vendor r8125 driver instead of the in-kernel r8169. See the troubleshooting section below.
WoL troubleshooting
WoL involves a chain from BIOS to NIC driver to PCI bridge. When it does not work, you need to figure out which link in the chain is broken. Work through these steps in order.
1. Check BIOS settings
WoL must be enabled in your BIOS/UEFI. The exact option names vary by vendor, but look for:
- Wake on LAN or Wake on PCI/PCIe – enable this.
- ErP Ready or ErP Lot 6 – disable this. ErP is an EU power-saving regulation that cuts standby power below what the NIC needs to listen for magic packets. If ErP is on, WoL cannot work.
- Deep Sleep – disable if present. Similar to ErP, this cuts power to PCIe slots during standby.
2. Test basic suspend first
Before debugging WoL, verify the NAS can suspend and wake at all:
# Check available sleep states
cat /sys/power/state
You should see mem in the output. Test a manual suspend and wake with the power button:
sudo systemctl suspend
Press the power button to wake. If this does not work, suspend itself is broken (check ACPI settings in BIOS).
Identify spurious wake sources
Sometimes the NAS wakes immediately after suspend. Check what woke it:
# What woke the system last time?
journalctl -b -k | grep -i "wake"
# List ACPI wake sources
cat /proc/acpi/wakeup
The output looks like:
Device S-state Status Sysfs node
XHC0 S3 *enabled pci:0000:00:14.0
GLAN S4 *enabled pci:0000:00:1f.6
...
Disable wake sources one at a time to find the culprit. Use a binary search – disable half, test, narrow down.
To disable a wake source temporarily (resets on reboot):
echo XHC0 | sudo tee /proc/acpi/wakeup
Once you find the problematic device, disable it permanently via a udev rule in your NixOS config:
services.udev.extraRules = ''
# Disable USB controller wake (XHC0 causes spurious wakeups)
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x8086", ATTR{device}=="0xa0ed", ATTR{power/wakeup}="disabled"
'';
Find the vendor and device IDs from lspci -nn for the corresponding PCI device.
3. Verify WoL is enabled on the NIC
After rebuild with braid.autoSuspend.wolInterface set, verify with doctor:
sudo braid doctor
Expected row:
[ok] wake-on-lan eno1 reports Wake-on: g (magic packet armed)
Wake-on: g means WoL is active (magic packet mode). If doctor reports Wake-on: d (disabled), the NixOS config is not taking effect – check that you rebuilt, that the interface name is correct, and that BIOS/driver WoL settings allow wake.
With autoSuspend enabled, braid also checks this before every automatic suspend. If the NAS is idle but does not sleep, run sudo braid doctor and inspect the wake-on-lan row.
4. Test WoL from another machine
From a different machine on the same network, send a magic packet:
# Install wakeonlan tool (on the sending machine)
# NixOS: nix-shell -p wakeonlan
# macOS: brew install wakeonlan
# Get the NAS MAC address (on the NAS, before suspending)
ip link show eno1
# look for "link/ether xx:xx:xx:xx:xx:xx"
# Suspend the NAS
ssh user@nas sudo systemctl suspend
# Send the magic packet (from the other machine)
wakeonlan xx:xx:xx:xx:xx:xx
If the NAS wakes, WoL is working. If not, continue to the next steps.
5. NIC driver issues
RTL8125 (Realtek 2.5GbE)
The in-kernel r8169 driver handles the RTL8125 but has unreliable WoL. The vendor r8125 driver fixes this.
Add to your NixOS config:
boot.extraModulePackages = with config.boot.kernelPackages; [ r8125 ];
boot.blacklistedKernelModules = [ "r8169" ];
After rebuild, verify the driver:
ethtool -i eno1 | grep driver
# should show: driver: r8125
6. PCI bridge wakeup (PME propagation)
Even with WoL enabled on the NIC, the NIC’s wake signal (PME – Power Management Event) must propagate through the PCI bridge to reach the CPU. Some BIOS implementations do not enable PME on intermediate bridges.
Check if PME is enabled on the bridge:
# Find the NIC's PCI address
lspci | grep -i ethernet
# e.g., 05:00.0 Ethernet controller: Intel ...
# Find its parent bridge
lspci -t
# Look for the tree path to your NIC
# Check PME on the bridge
sudo lspci -vvs 00:1c.0 | grep -i pme
# Look for "PME-Enable+" (good) or "PME-Enable-" (bad)
If PME is disabled on the bridge, enable it with a udev rule:
services.udev.extraRules = ''
# Enable PME on PCI bridge for NIC WoL
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x8086", ATTR{device}=="0x7ab8", RUN+="${pkgs.pciutils}/bin/setpci -s %k CAP_PM+04.W=0100:0100"
'';
The setpci command sets the PME_En bit in the PCI PM capability. Replace the vendor/device IDs with those from your bridge:
sudo lspci -nn -s 00:1c.0
# e.g., 00:1c.0 PCI bridge [0604]: Intel Corporation ... [8086:7ab8]
Finding the right bridge
If lspci -t is hard to read, use this to trace the full path from NIC to root:
# Starting from the NIC PCI address (e.g., 05:00.0)
cd /sys/bus/pci/devices/0000:05:00.0
ls -la .. # parent bridge
# Follow symlinks up until you reach the root bridge
Each bridge in the chain must have PME enabled for WoL to work. In practice, it is usually only one bridge between the NIC and the root that needs fixing.
7. Still not working
If WoL still fails after all the above:
-
Check
dmesgon wake – after waking with the power button, look for clues:dmesg | grep -i -E "wake|suspend|pme|wol" -
Try a different NIC – if your motherboard has multiple ethernet ports, try WoL on each one. Onboard Intel NICs are most reliable.
-
Test with a minimal NixOS config – remove all non-essential services and test WoL in isolation. If it works minimal but not full, bisect your config.
-
Check the NIC firmware – some NICs need firmware loaded at boot for WoL. Check
dmesg | grep firmwarefor errors.
What’s next
- Monitoring and alerts – disk health alerts
- Day-to-day NAS usage – the reboot/unlock cycle
Related commands
Fan control
This guide covers how to drive chassis fans from HDD temperatures on a NixOS NAS using the braid.fanControl module.
Read this if you want quieter idle and predictable ramp under sustained disk load – BIOS fan curves cannot see HDD temperatures, only CPU and motherboard temperatures.
Why HDD-driven fan control
HDD longevity drops as drives run hotter, so the goal is to keep them under a target temperature – a widely used rule of thumb is ~40 C. The catch is that the BIOS fan curve can’t see drive temperature; it reads only CPU package temp and a motherboard sensor. So no matter how the BIOS ramps the chassis fans, nothing in that loop is actually watching the drives. The BIOS already protects the CPU regardless of its TDP – the drives are the part left unmonitored.
The fix is to move fan control into Linux userspace using drive temps as the signal. The kernel’s drivetemp module exposes each SATA drive’s SMART temperature as a standard hwmon input, and hddfancontrol reads those inputs and drives the chassis fan’s PWM proportionally to the hottest drive.
braid.fanControl wraps hddfancontrol so you only provide two hardware-specific values (the Super I/O platform device name and PWM channel number from pwmconfig) plus two calibration values (pwm.minStart/pwm.maxStop from hddfancontrol pwm-test). The module handles the systemd service, drivetemp loading, SATA hotswap udev rules, and crash recovery.
Scope: braid.fanControl monitors all visible SATA devices, not only braid pool members. Drives generate heat regardless of LUKS state, pool membership, or mount status – binding fan control to pool state would leave warm disks uncooled when the pool is locked or before first unlock. SAS drives are out of scope.
The stack
| Layer | Role |
|---|---|
drivetemp (kernel) | Exposes each SATA drive’s SMART temp as an hwmon input |
| Super I/O driver (kernel) | Board-specific (nct6775, f71882fg, it87, …) – drives the chassis fan PWM headers |
lm_sensors (userspace) | Provides sensors, sensors-detect, pwmconfig for discovery |
hddfancontrol (userspace) | Reads drivetemp hwmon inputs for all SATA drives, ramps PWM from the hottest |
braid.fanControl (NixOS) | Runs hddfancontrol as a systemd service, handles SATA hotswap and crash recovery |
Setup has two phases: interactive discovery on the running machine (one-time), then committing the result to Nix.
Prerequisites
- BIOS: put chassis fan headers into software/manual control, and match the header mode to the fan type – PWM for 4-pin fans, DC (voltage) for 3-pin fans. Getting this wrong leaves the fan either stuck at a fixed speed or uncontrollable from userspace. If unsure,
pwmconfig’s spin-down test (below) will tell you: a fan on the wrong header mode will not ramp down. - Leave the CPU fan header on BIOS auto. Don’t fight the board’s package thermal logic with userspace – the BIOS is better at protecting the CPU than you are.
Discovery
Discovery is a one-time interactive procedure. Its only output is four values you paste into Nix at the end:
pwm.platformDevice– platform device name of the Super I/O chip (e.g.f71882fg.656)pwm.number– PWM channel number on that chip (e.g.2forpwm2)pwm.minStart– PWM value needed to start the fan from standstillpwm.maxStop– PWM value below which the spinning fan stalls
Install the tooling and load the sensor modules
braid.fanControl loads drivetemp automatically, but the interactive operator tools (sensors, sensors-detect, pwmconfig, hddfancontrol) are only needed on your PATH during discovery. Add them temporarily, plus your board’s Super I/O driver – these can stay in the committed config so future re-runs after drive swaps or chassis changes have the same tools available:
{ pkgs, ... }:
{
environment.systemPackages = [ pkgs.lm_sensors pkgs.hddfancontrol ];
boot.kernelModules = [ "coretemp" ]; # drivetemp added by braid.fanControl
}
Rebuild, then confirm you see per-drive temps:
sensors | grep -A1 drivetemp
You should see one drivetemp-scsi-*-0 block per SATA drive, each showing a current temp1 reading. drivetemp must be loaded before you run pwmconfig, or drive temps will not appear as eligible fan inputs.
Find your Super I/O chip
Run sudo sensors-detect and accept the defaults. When it asks whether to write /etc/modules-load.d/lm_sensors.conf, answer no – on NixOS, kernel modules are declared in boot.kernelModules, not in /etc.
At the end sensors-detect prints a summary. For most boards it names a driver (nct6775, it87, …); add that driver to boot.kernelModules alongside coretemp, rebuild, and confirm a new block appears in sensors showing fan RPMs and PWMs.
If the summary says Found unknown chip with ID 0xXXXX, sensors-detect’s chip-ID table has fallen behind the kernel. The kernel driver may already support your chip even though the detect script doesn’t recognize it. Grep the ID in the kernel source to find the driver:
# on github, search drivers/hwmon/*.c in torvalds/linux for the ID
# e.g., 0x1502 turns up in drivers/hwmon/f71882fg.c, so the module is f71882fg
Add the module you found to boot.kernelModules. If modinfo <module> works and sensors still shows no new block after rebuild, move on to the next section.
“Device or resource busy” on module load
If dmesg shows your Super I/O driver correctly identifying the chip but modprobe fails with Device or resource busy, ACPI has reserved the hwmon I/O region. The fix is a kernel parameter:
boot.kernelParams = [ "acpi_enforce_resources=lax" ];
This requires a full reboot – kernel command line changes don’t apply on nixos-rebuild switch alone. After the reboot, sensors should show a block for your Super I/O chip with fan RPMs and PWMs.
Map fans to PWMs with pwmconfig
pwmconfig identifies which PWM controls which fan by briefly stopping each fan in turn. Run it when drives are idle (not mid-scrub or rebuild) – a stalled fan during sustained write load is a bad place to be.
Before starting, record each PWM’s current enable value. pwmconfig flips them to manual (1) to run its spin-down test, and the meaning of other values is driver-specific (e.g. f71882fg uses 0=off / 1=manual / 2=auto; other drivers differ). Restoring the original is safer than hard-coding a mode:
for p in /sys/class/hwmon/*/device/pwm[0-9]_enable; do
printf '%s = %s\n' "$p" "$(cat "$p")"
done
Save that output somewhere you can read after pwmconfig exits. Then run:
sudo pwmconfig
It walks each PWM, asks whether to switch it to manual (say yes so the spin-down test can run), then stops each fan briefly and asks which fanN_input reading dropped. Answer based on what you observe in the tool’s output.
After identification, it asks which fans to configure. Pick only the chassis fans. Skip the CPU PWM – leave it BIOS-controlled. Also skip any PWM whose fan did not respond (unpopulated header, or fan/header mode mismatch in BIOS).
pwmconfig writes an /etc/fancontrol file at the end. You won’t use that file (braid uses hddfancontrol, not vanilla fancontrol), but the tool’s spin-down output is still how you identify the PWM path and measure stall behavior. Record the PWM sysfs path for the chassis fan – something like /sys/devices/platform/<super-io>/hwmon/hwmonN/device/pwmN.
Translate the PWM path to a platform device
braid.fanControl takes the stable platform device name plus the PWM channel number, and resolves the (unstable) hwmonN segment at service start. Translate the pwmconfig-surfaced sysfs path with:
pwm=/sys/class/hwmon/hwmon4/device/pwm2 # from pwmconfig output
pwm_dir=$(dirname "$pwm")
if [ "$(basename "$pwm_dir")" != device ]; then
pwm_dir="$pwm_dir/device"
fi
basename "$(readlink -f "$pwm_dir")"
# -> f71882fg.656
The if branch handles both sysfs layouts: hwmon*/device/pwmN (common on f71882fg, nct6775) and hwmon*/pwmN (fallback). Without it, the fallback layout resolves to hwmon4 instead of the platform device.
The PWM number is the numeric suffix on the pwmN filename (2 in the example above).
After pwmconfig exits, restore each skipped PWM to the value you recorded:
echo <original> | sudo tee /sys/class/hwmon/<N>/device/pwmK_enable
Measure minStart and maxStop with hddfancontrol pwm-test
hddfancontrol pwm-test ramps the PWM up and down while measuring fan RPM. It finds:
pwm.minStart– the PWM at which a stopped fan begins spinning againpwm.maxStop– the highest PWM at which a spinning fan stalls
Run it against the chassis PWM path from the previous step:
sudo hddfancontrol pwm-test -p /sys/devices/platform/.../pwmN
It takes a couple of minutes (ramps slowly to avoid bouncing the fan). Record the final minStart and maxStop values it prints.
If the fan has a hardware RPM floor (common on voltage-controlled 3-pin fans, and some boards’ chassis headers even in PWM mode), pwm.maxStop will be 0 and pwm.minStart will be some low value – the fan never actually stops. That’s fine; hddfancontrol still handles the ramp correctly. The --min-fan-speed-prct floor in braid.fanControl prevents the daemon from commanding the fan off in any case.
Committing to Nix
Fan control is a braid sub-feature: it activates only when braid.enable = true (see Getting started). The recipes below show the full braid block; merge the non-braid lines (boot.*, environment.systemPackages) into your existing config.
Minimal recipe
Paste the four discovery values into braid.fanControl:
{ pkgs, ... }:
{
environment.systemPackages = [ pkgs.lm_sensors ]; # optional: tools for re-running discovery
boot.kernelModules = [ "coretemp" "nct6775" ]; # your Super I/O driver here
# boot.kernelParams = [ "acpi_enforce_resources=lax" ]; # only if needed
braid = {
enable = true; # fan control only runs when the braid module is enabled
fanControl = {
enable = true;
pwm = {
platformDevice = "nct6775.656";
number = 2;
minStart = 65; # from hddfancontrol pwm-test
maxStop = 60; # from hddfancontrol pwm-test
};
};
};
}
The module resolves the PWM sysfs path at service start by globbing /sys/devices/platform/<platformDevice>/hwmon/hwmon*/{device/,}pwm<number>, which handles hwmonN renumbering across reboots. The platform device name (nct6775.656, f71882fg.656, etc.) is stable.
Sane defaults for the rest:
minTemp = 30/maxTemp = 40– fan floors below 30 C, ramps to full at 40 CminFanSpeedPercent = 20– fan never drops below 20% of range (conservative; upstream hddfancontrol default)interval = "30s"– 30-second polling interval
Override any of these in the braid.fanControl block if you want a different curve. See NixOS configuration for the full option table.
Tuning the curve
- Ramp starts too soon / fan audibly spools early on idle: raise
minTemp(try 32-34). - Drives climbing past 42-44 C under sustained load: lower
maxTemp(try 38) or raiseminFanSpeedPercent(try 30). - Fan noticeably oscillating: raise
interval(e.g."60s"). HDDs heat slowly, so aggressive polling only adds jitter.
Additional sensor modules
For ECC DIMM temp monitoring (visible in sensors; not used by hddfancontrol directly):
boot.kernelModules = [ "coretemp" "nct6775" "jc42" ];
Verification
Watch both the drivetemp input and the PWM/RPM, not RPM alone. CPU heat or ambient temperature can produce a false-positive fan ramp if you’re eyeballing only RPM.
The self-contained recipe for a braid NAS (btrfs is already assumed): run a scrub as the heat source. It reads every extent on every drive, which is representative NAS load and needs no pre-staged payload. The example below uses /mnt/storage as a concrete mount point – substitute your own pool mount:
# pane 1: start the scrub
sudo btrfs scrub start /mnt/storage
# pane 2: watch the thermal signals
watch -n2 sensors
# pane 3: follow the hddfancontrol daemon log
journalctl -u hddfancontrol-braid -f
Expected: drive temps climb 3-8 C over 10+ minutes (HDDs heat slowly), and the PWM tracks in step per your minTemp/maxTemp curve. The daemon log prints temperature readings and speed changes as it polls.
Cancel anytime with:
sudo btrfs scrub cancel /mnt/storage
If drive temps climb but PWM doesn’t move, double-check the resolved PWM file is writable (ls -l /sys/devices/platform/<platformDevice>/hwmon/hwmon*/{device/,}pwm<number>) and that hddfancontrol-braid is running (systemctl status hddfancontrol-braid).
Monitoring commands
Quick reference for monitoring the fan control loop on a running system. These paths assume an f71882fg-family Super I/O; substitute your platform device if different.
# Live chassis fan RPM + drive temps
watch -n2 'cat /sys/devices/platform/f71882fg.656/fan2_input; sensors drivetemp-*'
# Follow daemon log (temp readings, speed changes)
journalctl -u hddfancontrol-braid -f
# Current PWM value (0-255)
cat /sys/devices/platform/f71882fg.656/pwm2
# All fan channels at a glance (RPM, PWM, control mode)
for i in 1 2 3; do echo "fan${i}: $(cat /sys/devices/platform/f71882fg.656/fan${i}_input) RPM, pwm${i}: $(cat /sys/devices/platform/f71882fg.656/pwm${i}), enable: $(cat /sys/devices/platform/f71882fg.656/pwm${i}_enable)"; done
# Service status
systemctl status hddfancontrol-braid
# All hwmon sensors (CPU, board, DIMM, drives)
sensors
# SMART details for a specific drive
sudo smartctl -a /dev/sda
The pwmN_enable values: 0=off, 1=manual (hddfancontrol sets this), 2=BIOS auto. hddfancontrol is configured with --restore-fan-settings, so a clean service stop restores the original enable mode.
TUI fans panel
braid tui’s Data tab gains a Fans row when fan control is enabled. The
section title shows daemon: status for hddfancontrol-braid.service; the
row shows current PWM/RPM, the Driving column names the hottest drive setting
the curve, and the Curve column shows the configured temperature-to-speed
range. The panel polls every 5 seconds. Press r to refresh both pool and fan
probes immediately.
When braid.fanControl isn’t enough
braid.fanControl drives a single chassis PWM from the hottest SATA drive. That covers the common NAS case. If you need more control – multiple PWMs with different curves, PID-based responsiveness, non-SATA drive temperature sources – the usual escape hatches:
- Configure
services.hddfancontroldirectly (nixpkgs’s module supports multiple daemons, per-fan config). fan2go– Go daemon; supports multiple sensors and PID curves.- CoolerControl – more featureful, GUI-oriented.
Disable braid.fanControl.enable and bring your own solution. The drivetemp kernel module that braid loads is the only piece you’d need to keep.
Worked example: ASRock Industrial IMB-X1231
A concrete walk-through of the discovery phase on one board, to show what the unknown-chip and ACPI-busy paths look like in practice.
Hardware
- Board: ASRock Industrial IMB-X1231 (mini-ITX, 12th/13th gen Intel)
- CPU: Intel i3-14100T (35W TDP)
- Memory: ECC SODIMM with a jc42-compatible thermal sensor on SMBus
- Chassis fan: single 120mm rear, voltage-controlled (not 4-pin PWM)
sensors-detect reported an unknown chip
Probing for Super-I/O at 0x2e/0x2f
Trying family `VIA/Winbond/Nuvoton/Fintek'... Yes
Found unknown chip with ID 0x1502
(logical device 4 has address 0x290, could be sensors)
...
Probing for `National Semiconductor LM78' at 0x290... Success!
(confidence 6, driver `lm78')
The lm78 hit is a false positive: it’s a 1995 chip whose ISA probe signature collides with anything living at 0x290. The real chip is whatever has devid 0x1502. Ignore the lm78 recommendation.
Finding the right driver in kernel source
A grep of drivers/hwmon/ in torvalds/linux for 0x1502 pointed at f71882fg.c:
#define SIO_F81866_ID 0x1010
#define SIO_F81966_ID 0x1502
/* ... */
case SIO_F81866_ID:
case SIO_F81966_ID:
So: Fintek F81966, register-compatible with the F81866, driven by the f71882fg module – which has supported this ID since kernel 5.16. lm_sensors 3.6.2’s chip-ID table just hadn’t caught up.
ACPI held the hwmon I/O region
After adding f71882fg to boot.kernelModules and rebuilding, the driver identified the chip in dmesg:
f71882fg: Found f81866a chip at 0x290, revision 48, devid: 1502
But modprobe failed with Device or resource busy. Added boot.kernelParams = [ "acpi_enforce_resources=lax" ], rebooted, and sensors showed the full Super I/O block:
f81866a-isa-0290
fan1: 1489 RPM <- rear chassis, voltage-controlled
fan2: 1573 RPM <- CPU cooler
fan3: 0 RPM <- unpopulated header
pwm1: 58% pwm2: 58% pwm3: 72%
temp1: 36.0 C temp2: 20.0 C temp3: 37.0 C
pwmconfig mapped the fans
The spin-down test confirmed:
pwm2 -> fan2(chassis fan; voltage-controlled – RPM floors at ~395 and never fully stops regardless of PWM). Selected for braid.fanControl.pwm1 -> fan1(4-pin PWM CPU fan; stopped cleanly at PWM=60). Skipped – left on BIOS auto.pwm3 -> no fan(unpopulated header). Also left on auto.
hddfancontrol pwm-test
sudo hddfancontrol pwm-test -p /sys/devices/platform/f71882fg.656/hwmon/hwmon4/device/pwm2
...
minStart: 65
maxStop: 60
Final Nix config
boot.kernelModules = [ "coretemp" "f71882fg" "jc42" ];
boot.kernelParams = [ "acpi_enforce_resources=lax" ];
braid = {
enable = true;
fanControl = {
enable = true;
pwm = {
platformDevice = "f71882fg.656";
number = 2;
minStart = 65;
maxStop = 60;
};
};
};
End-to-end check
With drive temp at 32 C and the default curve (minTemp=30, maxTemp=40, minFanSpeedPercent=20): expected pwm2 = 20% + (32 - 30) / (40 - 30) * 80% of PWM range above maxStop. Observed pwm2 climbed smoothly with drive temp under a scrub, RPM tracked the pwmconfig correlation table. Control loop arithmetically correct.
What’s next
- Power management – suspend/resume and WoL, which interact with fan control
- Monitoring and alerts – SMART-based alerting complements active cooling
Related
- Arch Wiki: Fan speed control – distro-neutral reference for lm_sensors and fan control
- Kernel
drivetempdriver – what the module exposes hddfancontrol– upstream project
UPS
This guide covers enabling UPS (uninterruptible power supply) support on a braid NAS via NUT (Network UPS Tools).
Enabling UPS support (braid.enable = true plus braid.ups.enable = true)
turns on three behaviors:
- Orderly poweroff on low battery. When the UPS reports critical, NUT’s
upsmoninvokessystemctl poweroff. systemd unwindsbraid-online.service’s ExecStop, which runsbraid lockand cleanly unmounts the pool before the battery exhausts. - Preflight refusal without verified utility power.
braid add/remove/remove-missing/replacecheck UPS state at startup and refuse to begin a pool mutation unless the UPS reports verified utility power (OL). This narrows the surface that journal recovery needs to cover. - Live state visibility.
braid ups status(and the TUI Data tab) show the parsedupscoutput: status flags, battery charge, runtime remaining, load, estimated watts, input voltage, and device info.
Scope
v1 supports a single USB-connected UPS on the NAS, monitored by the NAS itself (single-host standalone). Non-USB drivers work through the escape hatch but are not tested.
Minimal config
# configuration.nix
{
braid = {
enable = true;
ups.enable = true;
};
}
Defaults: name = "ups", driver = "usbhid-ups", port = "auto". Rebuild
and plug the UPS’s USB cable in; NUT’s auto-detect finds the device.
Override the driver or port for non-USB UPSes:
braid = {
enable = true;
ups = {
enable = true;
name = "myups";
driver = "apcsmart";
port = "/dev/ttyS0";
};
};
Checking status
# Curated human summary
sudo braid ups status
# Machine-readable JSON (stable shape for scripts)
sudo braid ups status --json | jq .
Example human output:
UPS: ups
Status: OL
Battery: 100%
Runtime: 30m 0s
Load: 17% (56 W estimated)
Input: 120.0 V (transfer 88-142 V)
Device: APC Back-UPS ES 550G
Battery manufactured: 2023/04/12
Last test: Done and passed
The watts line is labeled estimated and omitted entirely if the UPS
does not report both ups.load and ups.realpower.nominal.
The --json output serializes the full parsed model. Distinct error
sentinels are emitted for the common non-OK cases:
| Condition | JSON shape | Exit code |
|---|---|---|
UPS reachable with populated ups.status | serialized UpscOutput | 0 |
UPS reachable but ups.status empty | serialized UpscOutput plus "warning": "ups_status_empty" | 0 |
| UPS query failed | {"error": "query_failed", "detail": "exit <code>: <stderr>"} | 1 |
| UPS invocation failed (upsc could not run – missing on PATH, killed by signal, or other runner-level failure) | {"error": "invocation_failed", "detail": "command failed: upsc ups: <reason>"} | 1 |
| UPS not enabled | {"error": "ups_not_enabled"} | 0 |
TUI UPS panel
braid tui’s Data tab gains a UPS row when UPS support is enabled.
Status text is color-coded by severity:
- Green – OL (on utility power).
- Yellow – OB (on battery, not yet critical).
- Red – LB / TESTFAIL / COMMBAD / FSD (critical; shutdown imminent or comms-loss).
- DarkGray – UPS query failed, or no UPS state available yet.
The panel polls on the same 5-second cadence as the
TUI fans panel. Press r to refresh both
pool and UPS probes immediately.
What happens on low battery
upsmonseesups.status: OB LB, declares the UPS critical.upsmonruns its configuredSHUTDOWNCMD. braid overrides the nixpkgs default (shutdown now) withsystemctl poweroff.- systemd walks its shutdown sequence.
braid-online.servicestops with ExecStop runningbraid lock, which unmounts the pool and closes LUKS. - The host powers off before the battery exhausts.
Under the default upsmon timings (POLLFREQ = POLLFREQALERT = 5,
FINALDELAY = 5) the window between LB detection and poweroff is
~10 seconds, plus however long braid lock takes. The default
battery.runtime.low in most UPS drivers is around 120 seconds, which
is enough headroom for a single-disk pool’s clean teardown. Larger
pools may need a wider battery.runtime.low (set at the NUT level,
not through braid).
Mutation refusal when utility power is not verified
With UPS enabled, braid add / remove / remove-missing / replace
refuse to start unless upsc returns a non-empty status set that
contains OL and no known blocker. The refusal cases are:
- on-battery (
OB) - a critical flag the TUI paints red (
LB/TESTFAIL/COMMBAD/FSD) OLmissing from an otherwise non-blocking status setupscquery or invocation failure (stopped daemon, unknown UPS name, or another fatal NUT error – the message includesupsc’s stderr when it exits non-zero)- an empty or missing
ups.status
Known non-critical advisory states such as OL RB, and unknown tokens
co-present with OL and no known blocker, still pass: the OL flag is
the affirmative utility-power proof, not a guarantee of full battery
health. Example refusal:
$ sudo braid add newdisk=/dev/disk/by-id/ata-TOSHIBA_NEW
error: cannot verify UPS is on utility power (UPS reports on-battery)
-- refusing to start add. Check 'braid ups status', restore utility
power, then retry.
Recovery: run braid ups status to confirm, fix the UPS/NUT state,
restore utility power, wait for the status to return to a trusted OL,
and retry the command.
Two clarifications:
braid doctor’sups_daemon: okmeans the configured NUT daemon is reachable; it is not a guarantee that mutating-command preflight will pass. The refusal error fromadd/remove/remove-missing/replaceis the primary channel for the exact mutation-readiness blocker.- The
OLgate assumes the configured NUT driver reportsOLon utility power as documented by NUT. If a device or driver violates that contract, inspect withbraid ups status; the recovery is to fix the NUT driver/config or disablebraid.upsuntil the UPS state can be trusted.
doctor checks
braid doctor adds two UPS-adjacent checks when UPS support is enabled:
- ups daemon – fails if
upscis missing or cannot be spawned, because braid cannot verify the enabled UPS shutdown path. It warns ifupscruns but cannot reach the daemon or exits non-zero. Fix missingupscby checking the braid wrapper/NUT package path; fix daemon reachability withsystemctl status upsd.service. - braid-online – fails (high severity) if the pool is mounted but
braid-online.serviceis not active. Without that service active, the UPS shutdown path does not unmount the pool, and the safety guarantee silently breaks. Fix by runningsystemctl start braid-online.serviceorbraid unlock.
Both checks skip when UPS support is disabled, and the braid-online check additionally skips when the pool is not mounted (there is nothing for the UPS shutdown path to unmount).
v1 limitation: no async alert
There is no asynchronous notification when the UPS goes on battery or
loses comms in v1. Operators who are not actively watching braid ups status or the TUI will not see those conditions – only the orderly
shutdown on LB is automatic.
This is deliberate: integrating UPS events into braid’s shared alert
model requires splitting AlertCause by persistence semantics
(latched-until-ack for disk errors, active-while-condition-holds for
live UPS states) and is out of scope for v1. See
decisions/020-ups-integration.md
for the open-question status.
If you need asynchronous UPS notifications today, wire NUT’s
NOTIFYCMD directly – braid does not touch it.
Related
- ADR: UPS integration – scope, shutdown path, preflight contract.
upscman page – the raw command braid parses.power.upsNixOS options – the underlying nixpkgs module braid layers opinionated defaults on.
NixOS configuration
Complete reference for the braid NixOS module options. Read this when setting up braid for the first time or tuning behavior after initial setup.
Minimal config
# flake.nix
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05";
braid.url = "github:danneu/braid?ref=release";
};
outputs = { nixpkgs, braid, ... }: {
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
braid.nixosModules.default
./configuration.nix
];
};
};
}
?ref=release pins braid to its release channel: a moving branch the release
fast-forwards to each tag. nix flake update braid is then the “upgrade to the
newest release” button, and flake.lock still pins the exact rev. The snippet
deliberately omits braid.inputs.nixpkgs.follows – see Binary cache
and Tool overrides for why no-follows is the default.
The braid.url input takes any of:
# flake.nix
braid.url = "github:danneu/braid?ref=release"; # newest release (default)
braid.url = "github:danneu/braid?ref=v0.0.1"; # pin a tag
braid.url = "github:danneu/braid?rev=<commit>"; # pin an exact commit
# configuration.nix
braid = {
enable = true;
};
nixosModules.default supplies braid.package automatically. Override it only
to build the CLI yourself.
Binary cache
braid publishes a prebuilt x86_64-linux CLI to a public Cachix cache on every
release. Add it so the NAS pulls the binary instead of recompiling Rust:
# configuration.nix
nix.settings = {
extra-substituters = [ "https://braid.cachix.org" ];
extra-trusted-public-keys = [ "braid.cachix.org-1:I/p7fx1z5n0+O80KzMuT7aXRdkVyHr/buZKaBu7HvJs=" ];
};
The cache only hits when braid resolves to the exact store path CI built – that
is, with the recommended no-follows setup above. Setting
braid.inputs.nixpkgs.follows rebuilds braid against your nixpkgs, producing a
different path and a cache miss. See Toolchain pinning
and ADR 029.
What you get for free
When braid.enable = true, the module sets up:
- Monthly btrfs scrub – timer + service tied to pool lifecycle. Configurable via
braid.autoScrub. - Resilient boot – a dead drive never blocks boot. LUKS open and btrfs mount are deferred to
braid unlock, not wired intoboot.initrd. - Pinned toolchain – btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, and ethtool are pinned to NixOS stable versions. Override with
braid.packages.*if needed. - Shell completions – bash, zsh, and fish completions registered automatically via
clap_complete. - smartd integration –
services.smartdenabled by default with a braid-owned alert script. SMART failures trigger the braid alert service. - Storage group – a
storagegroup is created; mount point is set toroot:storage 2770after unlock. See Sharing and permissions. - Disk health monitoring – polls btrfs device stats every 5 minutes, audible beep on errors. Configurable via
braid.monitor. - Fan control (opt-in) – drive chassis fans from the hottest SATA drive temp. Handles hddfancontrol, SATA hotswap restart, crash recovery. Configurable via
braid.fanControl.
Module options
Core
| Option | Type | Default | Description |
|---|---|---|---|
braid.enable | bool | false | Enable the braid module |
braid.package | package or null | null | The braid CLI package; nixosModules.default defaults it to braid-cli-unwrapped |
braid.mountPoint | path | /mnt/storage | Where to mount the btrfs pool |
braid.poolAccessGroup | string or null | "storage" | Group for mount point access. null to disable |
braid.lockSystemdStopDeadlineSecs | positive int | 270 | Seconds to wait for the pool lock during braid-online.service ExecStop; must stay below the unit’s TimeoutStopSec |
Tool overrides
| Option | Type | Default | Description |
|---|---|---|---|
braid.packages.cryptsetup | package | pkgs.cryptsetup | cryptsetup package |
braid.packages.btrfsProgs | package | pkgs.btrfs-progs | btrfs-progs package |
braid.packages.utilLinux | package | pkgs.util-linux | util-linux package |
braid.packages.nut | package | pkgs.nut | NUT package |
braid.packages.smartmontools | package | pkgs.smartmontools | smartmontools package |
braid.packages.ethtool | package | pkgs.ethtool | ethtool package |
Override these only if you need a specific version for compatibility testing. The recommended setup omits braid.inputs.nixpkgs.follows (the Minimal config example above), so nixosModules.default sources these defaults from braid’s own pinned nixpkgs (nixos-26.05) – the same versions the release binary cache is built against, so braid is a cache hit. Adding braid.inputs.nixpkgs.follows = "nixpkgs" is an advanced opt-out: it dedups your closure but rebuilds braid against your nixpkgs (a cache miss) and sources these tools from your nixpkgs instead. If you take it, keep your nixpkgs on the same NixOS stable release braid targets so the parsed tool output stays compatible – see Toolchain pinning.
Auto-scrub
| Option | Type | Default | Description |
|---|---|---|---|
braid.autoScrub.enable | bool | true | Enable periodic btrfs scrub |
braid.autoScrub.interval | string | "monthly" | systemd calendar expression |
The scrub timer is lifecycle-aware: it starts when the pool comes online and stops when the pool goes offline. Persistent = true ensures a missed scrub runs on next unlock (e.g. the pool was locked over a monthly boundary).
braid’s scrub conflicts with the NixOS built-in services.btrfs.autoScrub. If both are enabled, evaluation fails with a clear error. Disable one or the other.
Monitoring
| Option | Type | Default | Description |
|---|---|---|---|
braid.monitor.enable | bool | true | Enable disk health monitoring |
braid.monitor.interval | string | "5min" | Polling interval (systemd time span) |
braid.monitor.beep | bool | true | Audible PC speaker beep on alert |
braid.monitor.alertCommand | string or null | null | Custom command to run on alert |
When beep = true, the module unblacklists the pcspkr kernel module, creates a beep group, and sets up a udev rule for PC speaker access. The beep loops with exponential backoff (5s, 10s, 20s, 40s, …) capped at once every 15 minutes, until acknowledged with braid ack.
alertCommand runs in addition to the beep (not instead of). Use it for push notifications, email, etc.:
braid.monitor.alertCommand = "curl -s -d 'Disk error on NAS' https://ntfy.sh/my-nas-alerts";
See Monitoring and alerts for the full workflow.
Auto-unlock
| Option | Type | Default | Description |
|---|---|---|---|
braid.autoUnlock.enable | bool | false | Enable USB keyfile auto-unlock |
braid.autoUnlock.keyDevice | string | "" | Block device path (/dev/disk/by-id/...) |
braid.autoUnlock.timeoutSec | positive int | 5 | Seconds to wait for USB device |
braid.autoUnlock.allowDegraded | bool | false | Mount with missing devices |
keyDevice must use a /dev/disk/by-id/ path – /dev/sdX names shift when devices are added or removed.
The auto-unlock service mounts the USB read-only, reads braid.key, unlocks the pool, and unmounts the USB immediately. The keyfile is never left accessible. If the USB is absent at boot, the service exits cleanly without blocking boot.
See Auto-unlock for the enrollment and setup workflow.
Auto-suspend
| Option | Type | Default | Description |
|---|---|---|---|
braid.autoSuspend.enable | bool | false | Suspend NAS when idle |
braid.autoSuspend.wolInterface | string or null | null | Network interface for Wake-on-LAN (required) |
braid.autoSuspend.idleTime | positive int | 900 | Seconds idle before suspend |
Requires a wired ethernet interface – WiFi interfaces are rejected at evaluation time (WoL needs ethtool, which does not work for WiFi).
Activity checks that block suspend:
braid idle– scrub or any btrfs kernel exclusive operation (balance, device add/remove/replace, resize, swap activate)- Active SSH sessions
- Active local sessions (TTY/X11/Wayland)
- SMB connections (auto-detected if
services.sambais enabled) - NFS connections (auto-detected if
services.nfs.serveris enabled)
The scrub timer is registered as a wakeup source so the NAS wakes for scheduled scrubs.
See Power management for the full workflow.
Fan control
| Option | Type | Default | Description |
|---|---|---|---|
braid.fanControl.enable | bool | false | Drive chassis fans from HDD temps |
braid.fanControl.pwm.platformDevice | string | (required) | Platform device name under /sys/devices/platform/ |
braid.fanControl.pwm.number | int | (required) | PWM channel number (1-based) |
braid.fanControl.pwm.minStart | int | (required) | Minimum PWM to start fan from standstill |
braid.fanControl.pwm.maxStop | int | (required) | PWM below which a spinning fan stalls |
braid.fanControl.minTemp | int | 30 | Temperature (C) at which fan runs at minimum speed |
braid.fanControl.maxTemp | int | 40 | Temperature (C) at which fan runs at full speed |
braid.fanControl.minFanSpeedPercent | int | 20 | Minimum fan speed % (0 = fan may stop) |
braid.fanControl.interval | string | "30s" | Temperature polling interval |
pwm.platformDevice and pwm.number are found via pwmconfig. pwm.minStart and pwm.maxStop are measured with hddfancontrol pwm-test -p <pwm-path>. All four are hardware-specific.
Monitors all visible SATA devices (not only braid pool members). Requires a board-specific Super I/O driver in boot.kernelModules – see Fan control for the hardware discovery workflow.
UPS
| Option | Type | Default | Description |
|---|---|---|---|
braid.ups.enable | bool | false | Enable UPS support via NUT (single-host standalone) |
braid.ups.name | string | "ups" | UPS identifier for upsd/upsc |
braid.ups.driver | string | "usbhid-ups" | NUT driver; the USB default covers most home-NAS UPSes |
braid.ups.port | string | "auto" | Driver port; auto finds the first matching USB UPS |
When enabled, NUT triggers an orderly poweroff on low battery (unwinding braid-online.service -> braid lock -> unmount) and pool-mutating commands (add/remove/remove-missing/replace) refuse to start unless the UPS reports verified utility power (OL). Only name is written to /etc/braid/config.json, so braid ups status and the TUI know which UPS to query; driver and port configure the NUT driver only. Non-USB drivers (apcsmart, snmp-ups) are an escape hatch and not first-class.
See UPS for the setup workflow and live status.
Full config example
Every option with its default (or a representative value for required/optional fields):
braid = {
enable = true;
# package -- defaults to nixosModules.default's pinned braid-cli-unwrapped;
# set only to build the CLI yourself.
mountPoint = "/mnt/storage"; # default
poolAccessGroup = "storage"; # default; null to disable
lockSystemdStopDeadlineSecs = 270; # default; must stay below braid-online TimeoutStopSec
# Tool version overrides -- the recommended setup omits nixpkgs `follows`, so
# defaults come from braid's pinned nixos-26.05 (cache hit); `follows` is an
# opt-out that tracks your nixpkgs but rebuilds braid (cache miss). See "Tool overrides".
# packages.cryptsetup = pkgs.cryptsetup;
# packages.btrfsProgs = pkgs.btrfs-progs;
# packages.utilLinux = pkgs.util-linux;
# packages.nut = pkgs.nut;
# packages.smartmontools = pkgs.smartmontools;
# packages.ethtool = pkgs.ethtool;
autoScrub = {
enable = true; # default
interval = "monthly"; # default; any systemd calendar expression
};
monitor = {
enable = true; # default
interval = "5min"; # default
beep = true; # default
alertCommand = null; # default; e.g. "curl -s -d 'alert' https://ntfy.sh/my-nas"
};
autoUnlock = {
enable = false; # default
keyDevice = "/dev/disk/by-id/usb-Kingston_DataTraveler_XXXX-0:0";
timeoutSec = 5; # default
allowDegraded = false; # default
};
autoSuspend = {
enable = false; # default
wolInterface = "eno1";
idleTime = 900; # default (15 minutes)
};
fanControl = {
enable = false; # default; opt-in
pwm = {
platformDevice = "f71882fg.656"; # from pwmconfig (required)
number = 2; # from pwmconfig (required)
minStart = 65; # from hddfancontrol pwm-test (required)
maxStop = 60; # from hddfancontrol pwm-test (required)
};
minTemp = 30; # default
maxTemp = 40; # default
minFanSpeedPercent = 20; # default
interval = "30s"; # default
};
ups = {
enable = false; # default; opt-in
name = "ups"; # default
driver = "usbhid-ups"; # default
port = "auto"; # default
};
};
Related
- Getting started – first-time setup walkthrough
- Auto-unlock – USB keyfile enrollment
- Monitoring and alerts – alert workflow and custom commands
- Power management – auto-suspend and WoL setup
- UPS – NUT-backed orderly poweroff, preflight safety, live status
- Fan control – hardware discovery and fan control setup
- Sharing and permissions – storage group and Samba
Sharing and permissions
How braid manages mount point permissions and how to share the pool over the network. Read this when setting up file access for multiple users or configuring Samba/NFS.
Mount point permissions
After every mount-producing command (unlock, add, recover), braid sets the mount root to:
root:storage 2770
This means:
- Owner (root): full access
- Group (storage): read, write, execute
- Others: no access
- Setgid bit (2): new files and directories inherit the
storagegroup
Any user in the storage group can read and write files under the mount point. The setgid bit ensures that files created by any group member are owned by the storage group, so all members can manage each other’s files.
Adding users to the storage group
# configuration.nix
users.users.alice = {
isNormalUser = true;
extraGroups = [ config.braid.poolAccessGroup ];
};
users.users.bob = {
isNormalUser = true;
extraGroups = [ config.braid.poolAccessGroup ];
};
Using config.braid.poolAccessGroup instead of the literal "storage" keeps the reference correct if you customize the group name.
For network-facing services like Jellyfin or Plex that should only read a single subtree, prefer mounting that subvolume separately and using POSIX ACLs over adding the service to the storage group. See Mounting subvolumes for the recipe.
Custom group name
braid.poolAccessGroup = "nas";
The group is created automatically. All behavior (mount permissions, setgid) works the same with any valid Unix group name.
Disabling the storage group
braid.poolAccessGroup = null;
When null, braid does not create a group or set permissions on the mount point after unlock. You manage permissions yourself.
Umask note
The setgid bit on the mount root ensures new files get the correct group. But the creating user’s umask controls the permission bits. The default umask (022) produces files with mode 644 (owner write, group/other read-only).
For collaborative write access where all group members can edit each other’s files, set a more permissive umask for processes that write to the pool:
# In a Samba share definition (see below), force create mode handles this.
# For SSH users, set umask in their shell profile:
programs.bash.interactiveShellInit = ''
umask 002
'';
With umask 002, new files are 664 (owner and group read-write, other read-only) and new directories are 775.
Samba integration
Samba is not part of the braid module, but it works well with the braid mount point. Here is a declarative NixOS Samba config:
# configuration.nix
services.samba = {
enable = true;
openFirewall = true;
settings = {
global = {
workgroup = "WORKGROUP";
"server string" = "NAS";
security = "user";
};
storage = {
path = config.braid.mountPoint;
browseable = "yes";
"read only" = "no";
"valid users" = "@${config.braid.poolAccessGroup}";
# File permissions for Samba-created files
"create mask" = "0664";
"force create mode" = "0664";
"directory mask" = "2775";
"force directory mode" = "2775";
};
};
};
# Set Samba passwords (run once per user):
# sudo smbpasswd -a alice
Key points:
valid users = @storagerestricts the share to the storage group.force create modeandforce directory modeensure group-writable permissions regardless of the client’s umask.- New files and directories inherit the
storagegroup from the setgid bit braid sets on the mount root – a kernel behavior that does not requireinherit permissions.force directory mode = 2775keeps that setgid bit on Samba-created subdirectories so inheritance carries down the tree. - Samba users must also be system users in the storage group.
Multiple shares
Create separate shares for different directories under the mount point:
services.samba.settings = {
photos = {
path = "${config.braid.mountPoint}/photos";
browseable = "yes";
"read only" = "no";
"valid users" = "@${config.braid.poolAccessGroup}";
"create mask" = "0664";
"force create mode" = "0664";
"directory mask" = "2775";
"force directory mode" = "2775";
};
media = {
path = "${config.braid.mountPoint}/media";
browseable = "yes";
"read only" = "yes"; # read-only share
"valid users" = "@${config.braid.poolAccessGroup}";
};
};
Binding shares to the pool lifecycle
By default, samba-smbd.service (the systemd unit NixOS creates from services.samba.enable) keeps running after braid lock. If a client is mid-transfer when you lock, umount blocks until the file handle is released. Wire the share into the pool lifecycle so systemd starts samba-smbd after braid unlock and stops it again before braid lock runs umount:
systemd.services.samba-smbd = {
# Start smbd when braid marks the pool online after a successful unlock.
wantedBy = [ "braid-online.service" ];
# Stop smbd when braid-online stops, before braid lock unmounts the pool.
bindsTo = [ "braid-online.service" ];
# Order smbd on the correct side of braid-online start and stop jobs.
after = [ "braid-online.service" ];
# Skip boot or direct starts when the braid mount point is not mounted.
unitConfig.ConditionPathIsMountPoint = config.braid.mountPoint;
};
All four fields are load-bearing and do different jobs:
wantedBy–samba-smbdstarts whenbraid-online.servicestarts (i.e. afterbraid unlock).bindsTo–samba-smbdstops ifbraid-online.servicestops or goes inactive (i.e. beforebraid lockrunsumount).after– ordering only, ensuressamba-smbdis started/stopped on the correct side ofbraid-online.service.ConditionPathIsMountPoint– skips activation when the braid mount point is only an offline directory, so any start the triad did not initiate cannot serve an unmounted pool.
braid lock walks systemctl show -P BoundBy braid-online.service (the reverse of BindsTo=) and stops every consumer this way before unmount, and ConditionPathIsMountPoint keeps them from restarting against an offline pool. This is the same pattern braid’s own scrub timer uses (see modules/braid/storage.nix).
The condition matters even with wantedBy: NixOS also starts Samba at boot through samba.target (which samba-smbd.service is wantedBy), and that boot edge would start smbd before any unlock. ConditionPathIsMountPoint is what stops it from serving the empty, offline mount directory. Only smbd serves files from the pool and can hold it busy during lock, so leave samba.target, nmbd, and winbindd untouched.
NFS
The same approach works for NFS. Export the braid mount point and control access at the network level:
services.nfs.server = {
enable = true;
exports = ''
${config.braid.mountPoint} 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
'';
};
Adjust the subnet and options for your network. See exports(5) for the full option reference.
The same wantedBy + bindsTo + after + ConditionPathIsMountPoint pattern on braid-online.service (see “Binding shares to the pool lifecycle” under Samba above) applies to nfs-server.service if you want NFS to stop before braid lock runs umount and start again after braid unlock. As with Samba, the condition gates NixOS’s default nfs-server.service boot-start edge (wantedBy = [ "multi-user.target" ]) against an offline braid mount point.
Auto-suspend integration
If you enable braid.autoSuspend, active SMB and NFS connections automatically block suspend. This is auto-detected from whether services.samba or services.nfs.server is enabled in your NixOS config – no extra configuration needed.
Related
- NixOS configuration –
braid.poolAccessGroupoption reference - Getting started – first-time pool setup
- Mounting subvolumes – read-only service access to one subvolume
- Power management – auto-suspend with SMB/NFS awareness
Mounting subvolumes
Mount a btrfs subvolume at a custom path when a person or service should see
one part of the pool as its own filesystem. Common examples are a friendlier
path under /home, or a media path like /var/lib/jellyfin/media.
How braid mounts the pool
braid mounts the btrfs top-level subvolume (subvolid=5) at
/mnt/storage by default. Treat that as the management mount: it is where you
create subvolumes, run btrfs commands, and manage the whole pool. Consumer
services do not need access to that mount root.
The subvol= mount idiom
btrfs can mount a subvolume directly with subvol=<path>. The btrfs docs
describe the important isolation property this way: “the parent directory is
not visible and accessible”, which is “similar to a bind mount”.
For braid, that means a service can see only movies at
/var/lib/jellyfin/media without needing permission to traverse
/mnt/storage.
Recipe: mount a subvolume at a custom path
Create the subvolume while the pool is unlocked:
sudo btrfs subvolume create /mnt/storage/movies
Find the btrfs filesystem UUID from braid status (look for the FSID:
line; the JSON form is braid status --json and the field is fsid):
sudo braid status
Add a native systemd mount unit to your NixOS configuration:
systemd.mounts = [{
what = "/dev/disk/by-uuid/<btrfs-fs-uuid>";
where = "/home/dan/my-movies";
type = "btrfs";
options = "subvol=movies,ro,noatime";
wantedBy = [ "braid-online.service" ];
bindsTo = [ "braid-online.service" ];
after = [ "braid-online.service" ];
}];
Field notes:
whatpoints at the btrfs filesystem UUID, not an individual LUKS disk.whereis the path where the subvolume should appear.type = "btrfs"selects the btrfs mount helper.optionsselects themoviessubvolume.rois optional but recommended for read-only consumers.wantedBystarts the mount whenbraid-online.serviceactivates afterbraid unlock.bindsTois the load-bearing lifecycle edge. It puts the mount unit inBoundBy braid-online.service, which is whatbraid lockstops before unmounting the pool.afterorders mount startup afterbraid-online.service, so the btrfs/dev/disk/by-uuidsymlink exists before systemd resolveswhat.
Rebuild and verify:
sudo nixos-rebuild switch
findmnt /home/dan/my-movies
systemctl show -P BoundBy braid-online.service
The escaped mount unit name, for example home-dan-my\x2dmovies.mount, should
appear in BoundBy braid-online.service.
subvol= vs bind mount
Both approaches can expose a subtree at another path. subvol= is the better
default for braid because it is conventional btrfs configuration, it mounts the
subvolume directly, and it does not require the consumer to traverse
/mnt/storage.
Use a bind mount only when the consumer already has permission to traverse the source mount and you need the same mounted data at multiple paths.
Why not fileSystems with x-systemd.requires?
fileSystems is fstab-shaped. systemd’s fstab options can express
Requires= and After=, but not an arbitrary BindsTo=braid-online.service
edge. Without BindsTo, the mount is not listed in
BoundBy braid-online.service, so braid lock will not stop it before
unmounting the pool. Use native systemd.mounts for lifecycle-bound subvolume
mounts. See ADR 018
for the lifecycle model.
Worked example: read-only access for Jellyfin
Create the media subvolume:
sudo btrfs subvolume create /mnt/storage/movies
Mount it where Jellyfin expects media:
systemd.mounts = [{
what = "/dev/disk/by-uuid/<btrfs-fs-uuid>";
where = "/var/lib/jellyfin/media";
type = "btrfs";
options = "subvol=movies,ro,noatime";
wantedBy = [ "braid-online.service" ];
bindsTo = [ "braid-online.service" ];
after = [ "braid-online.service" ];
}];
Grant Jellyfin read-only traversal on the subvolume contents:
sudo setfacl -R -m u:jellyfin:rx /mnt/storage/movies
sudo setfacl -R -d -m u:jellyfin:rx /mnt/storage/movies
Do not add jellyfin to storage. That would grant the daemon read-write
access across the whole pool. The ACL above scopes read access to one
subvolume, and the subvol= mount means Jellyfin does not need to traverse
/mnt/storage itself.
Bind Jellyfin to the mount unit:
services.jellyfin = {
enable = true;
openFirewall = true;
};
systemd.services.jellyfin = {
wantedBy = lib.mkForce [ "var-lib-jellyfin-media.mount" ];
bindsTo = [ "var-lib-jellyfin-media.mount" ];
after = [ "var-lib-jellyfin-media.mount" ];
unitConfig.ConditionPathIsMountPoint = "/var/lib/jellyfin/media";
};
Bind the service to var-lib-jellyfin-media.mount, not directly to
braid-online.service. That ensures Jellyfin starts only after its media path
is mounted. During braid lock, systemd stops Jellyfin first, then the
subvolume mount, then braid unmounts the management mount and closes LUKS.
The full triad pattern is the same lifecycle shape described in Sharing and permissions.
Verify:
sudo -u jellyfin ls /var/lib/jellyfin/media
sudo braid lock
systemctl is-active jellyfin.service
systemctl is-active var-lib-jellyfin-media.mount
Point the Jellyfin web UI at /var/lib/jellyfin/media. After braid lock,
both units should be inactive and the LUKS devices should be closed.
Offline mountpoint safety
braid seals the pool mountpoint immutable (chattr +i) while the pool is
offline, so a process writing /mnt/storage before the pool mounts fails with
EPERM instead of silently landing data on the root filesystem (which the pool
would then hide on mount). See
ADR 028.
This boot seal covers only the pool mountpoint (/mnt/storage). It has two
consequences for subvolume mounts:
- Subvolumes mounted under
/mnt/storageare inherently protected by the parent seal – the bare mountpoint is the sealed directory. This is the safe default; prefer it. - Subvolumes mounted at separate paths (like the
/var/lib/jellyfin/mediaexample above) are not auto-sealed. While the pool is offline thesystemd.mountsunit is stopped, leaving a bare directory at that path. The unit you wired withbindsTo = braid-online.servicedoes not write while offline, but any other process that writes the path while the pool is offline lands data on root and gets shadowed on the next mount – the same bug the boot seal fixes for/mnt/storage.
To protect a separate-path subvolume mountpoint, seal it manually with the explicit-path form while the pool is offline:
sudo braid seal-mountpoint /var/lib/jellyfin/media
This is the braid-native remedy (the appliance has no chattr on its PATH). It
reports a non-zero exit if it could not protect the path, so a failed seal is
visible. It is not self-healing – unlike the pool mountpoint, braid does not
re-seal these paths on every boot, and braid doctor does not probe them. Re-run
it after a reconfiguration that recreates the directory. To clear it later, use
braid seal-mountpoint --unseal <path>.
What’s next
Related commands
Troubleshooting
Symptom-oriented index for common problems. Find your symptom below and follow the resolution.
Balance fails with “No space left on device”
btrfs balance needs temporary free space to relocate chunks. Braid balances convert both data and metadata profiles, so either side can hit ENOSPC even when there appears to be space available.
Fix: Free up empty data block groups first, then retry the original operation:
sudo btrfs balance start -dusage=0 /mnt/storage
The usage=0 pass relocates only completely empty data block groups, so it does
not need temporary work space. Keep recovery balances data-only: metadata block
groups are write headroom, and balancing them can hit metadata ENOSPC and force
the filesystem read-only.
If the retry still fails, inspect data vs metadata usage:
sudo btrfs filesystem usage /mnt/storage
df’s “Used” and “Available” columns cannot distinguish data, metadata, and
snapshot references, while braid status reports the same btrfs-derived
capacity. In btrfs filesystem usage, compare the Data and Metadata used/size
ratios to see which side is the bottleneck.
If there is enough temporary work space, a non-zero data threshold can reclaim nearly-empty groups, but it moves data:
sudo btrfs balance start -dusage=10 /mnt/storage
Pool won’t mount
Symptom: braid unlock fails because pool.json is missing or corrupted.
Fix: Rebuild UUID-keyed pool.json from disk labels and LUKS UUIDs. How you
start depends on the state of pool.json – bare discover previews only when
the file is absent; over a corrupt file it refuses and points you to
discover --write.
If pool.json is missing – preview, then write:
sudo braid discover
# Shows discovered disks -- verify they look correct
sudo braid discover --write
If pool.json is corrupt or unreadable – skip the preview and rebuild in
place (bare discover refuses corrupt state before scanning):
sudo braid discover --write
The corrupt rebuild preserves the original bytes at
pool.json.corrupt-<RFC3339-UTC> before overwriting; do not remove it first.
Then unlock normally:
sudo braid unlock
discover scans /dev/disk/by-id/ for LUKS devices with braid-* labels and reconstructs the membership file. See Recovery scenarios for details.
If pool.json is healthy and UUID-keyed, discover --write refuses on
purpose. Use braid add / braid remove / braid replace for normal
membership changes. If you have deliberately decided to re-discover instead,
move the file aside before running discover --write:
sudo mv /var/lib/braid/pool.json /var/lib/braid/pool.json.manual-backup
sudo braid discover --write
Interrupted operation (pending-op.json exists)
Symptom: braid commands fail with an error about a pending operation. This happens when a previous add, remove, remove-missing, or replace was interrupted (power loss, crash, killed process).
Fix: Use braid recover:
sudo braid recover
Recover reads the pending-operation journal, opens LUKS devices and mounts the pool if needed, probes the live btrfs topology, and rebuilds pool.json from actual state. It clears the journal only after the idle/no-paused recovery path succeeds.
Important
If recover refuses owed RAID1 replay because btrfs balance state is paused, running, or unknown, it left
pending-op.jsonin place. Inspect btrfs manually before clearing recovery state.
If devices are missing (drive failure during the interrupted operation):
sudo braid recover --allow-degraded
For scripted/unattended recovery:
echo "my-passphrase" | sudo braid recover --passphrase-stdin
See Recovery scenarios for detailed walkthroughs.
Missing device after drive failure
Symptom: braid status shows a missing device. The pool may be mounted degraded or may fail to mount.
You have two options:
Option A: Replace the disk (rebuilds data onto a new disk)
# Find the old disk name from braid status
sudo braid replace --old toshiba2 \
--new toshiba4=/dev/disk/by-id/ata-NEW_DRIVE_SERIAL
Replace copies data from surviving redundant copies onto the new disk. This restores full RAID1 redundancy. It takes hours for large disks.
Option B: Forget the missing device (no data rebuild)
# Find the missing device's btrfs devid from braid status
sudo braid remove-missing --missing-id 3
This removes the dead device entry from the btrfs filesystem. No data is rebuilt – you lose the redundant copy that was on the dead drive. The pool continues as a smaller array. Use this when you do not have a replacement disk available.
Auto-unlock fails
Symptom: Pool is not unlocked after reboot despite auto-unlock being configured.
Check the service logs:
journalctl -u braid-auto-unlock.service
Common causes:
- USB device not found: The USB drive was not plugged in or the
keyDevicepath is wrong. Verify withls /dev/disk/by-id/ | grep usb. - Keyfile not found: The USB filesystem does not contain
braid.keyat the root. The file must be named exactlybraid.key. - Keyfile resolves outside mount: A symlink on the USB points outside
/run/braid-key/. The service refuses this for security. - Timeout too short: The USB device takes longer to enumerate than
timeoutSec. Increase it in your NixOS config. - Missing devices: If a pool disk is dead and
allowDegraded = false(the default), auto-unlock exits with code 2. Setbraid.autoUnlock.allowDegraded = trueto allow degraded mount.
See Auto-unlock for the setup guide.
Beeper won’t stop
Symptom: The PC speaker is beeping (initially every few seconds, then less often) due to a disk health alert.
Fix: Acknowledge the alert:
sudo braid ack
This stops the beep loop and clears the alert state. Then investigate the underlying problem:
sudo braid status
sudo braid doctor
braid commands blocked by “another operation in progress”
Symptom: braid unlock, braid add, or braid recover fails with a message about another braid operation holding the pool lock.
The pool-mutating commands acquire an exclusive lock on /run/braid-pool.lock. If a previous command is still running (or crashed without releasing the lock), new commands fail fast.
Fix: Wait for the running command to finish. If the previous command crashed, the lock file is released automatically (it is a flock on a /run/ file, which is tmpfs and cleared on reboot). If you need to proceed before a reboot:
# Check if any braid process is still running
ps aux | grep braid
# If nothing is running, the lock was released — retry your command
Scrub won’t start
Symptom: systemctl status braid-scrub.timer shows the timer is inactive.
The scrub timer is lifecycle-bound to braid-online.service. It only runs while the pool is unlocked and mounted.
If a scrub was cancelled by lock or shutdown, braid resumes the partial scrub
the next time the pool comes online.
# Check pool state
sudo braid status
# If pool is offline, unlock it
sudo braid unlock
# Timer should now be active
systemctl status braid-scrub.timer
Scrub reported errors
Symptom: braid status shows Last scrub: <ts> (N errors) or
braid monitor raised a btrfs error alert after a scrub.
The scrub error count braid reports is authoritative – braid parses it from
btrfs scrub status. Journal lines are diagnostic clues, not a complete
per-error ledger: the kernel emits scrub messages through rate-limited helpers,
so a busy or bursty scrub can produce fewer journal lines than the count. A
non-zero count with sparse or missing journal lines is not a braid bug – it
usually means the kernel dropped log entries to stay under its rate limit.
Use the command printed under the scrub status, or run journalctl directly:
sudo journalctl -k --since '<scrub-start-time>' --grep 'BTRFS.*(at logical.*on (dev|mirror)|super block at physical)'
Output comes in two distinct grammars depending on whether the error is in a data/metadata extent or in a superblock copy.
Extent errors (data and metadata). Each affected sector may log a repair-summary line:
- Corrected via RAID1 mirror:
fixed up error at logical N on dev /dev/mapper/braid-X physical N(or... on mirror Nwhen the source mirror has no device). btrfs RAID1 read the healthy mirror and wrote it back over the bad copy. No file path – corrected lines give block coordinates only. A count consisting mostly offixed up errorlines means data integrity was preserved; investigate the disk that produced the bad reads. - Uncorrectable:
unable to fixup (regular) error at logical N on dev X physical N(or... on mirror N). RAID1 could not recover – the mirror was also bad or no mirror exists. The block is permanently damaged.
An uncorrectable extent error may also log an additional detail line that identifies what was lost. The detail emission is gated by a second rate-limit check, so it is not guaranteed to appear for every uncorrectable error. When present, the shapes are:
- Data extent, path resolved.
... at logical N on dev X, physical N, root N, inode N, offset N, length N, links N (path: subdir/victim.bin).(path: ...)is relative to the affected btrfs subvolume root, not absolute. The kernel builds it frompaths_from_inode()(reference/linux/fs/btrfs/scrub.c:457,reference/linux/fs/btrfs/backref.c:2125) and does not know what mount point exposes that subvolume. Prepend the mount point of the affected subvolume (default subvolume at/mnt/storage; named subvolumes wherever you configured them). - Data extent, path resolution failed. Same shape but ends
... path resolving failed with ret=Ninstead of(path: ...). Usually means the extent has no remaining inode references (file already deleted) or the inode lives in a snapshot rooted under a different subvolume than the search root. - Metadata.
... at logical N on dev X, physical N: metadata leaf|node (level N) in tree N. Tree-block corruption – no file path because the bad block lives in a btrfs tree, not in user data. Persistent metadata errors indicate disk failure.
Superblock errors. Logged as standalone messages from scrub_supers, not
as repair-summary + detail pairs. The grammar is independent of the extent
path:
super block at physical N devid N has bad csumsuper block at physical N devid N has bad generation N expect N
Damage to one of the device’s superblock copies. Investigate the device
(identified by devid), not a file.
For the path-resolution-failed case, you can try inode-resolve as a
best-effort:
sudo btrfs inspect-internal inode-resolve <inode> /mnt/storage
This succeeds only if the inode still exists in the subvolume rooted at the supplied path. Deleted files, extents with no remaining references, or files that live in a different subvolume will still produce no result – the kernel logged “path resolving failed” for the same reason.
A non-zero error count after a scrub means at least one block failed its
checksum or I/O. With btrfs RAID1, blocks with a healthy mirror are repaired
automatically (counted as Corrected – the fixed up lines above);
Uncorrectable means both copies were bad and the file (for data) or tree
block (for metadata) is now damaged. The journal output is your best diagnostic
surface, but treat it as evidence rather than a complete ledger: rely on the
scrub count for “how many,” and on the journal for “what kind, and where the
kernel could log it.” Restore affected files from backup and run braid ack
once you have investigated.
SMB/NFS service inactive after braid lock
Symptom: systemctl status samba-smbd.service (or nfs-server.service) shows inactive (dead) immediately after you ran braid lock.
This is intentional. On NixOS module installs, braid lock stops every service bound to braid-online.service via BindsTo=braid-online.service before it unmounts the pool. The cascade prevents busy-mount unmount failures.
Fix: Run braid unlock. It reactivates braid-online.service after mount, and systemd restarts every consumer that is also WantedBy=braid-online.service.
If the service does not restart on braid unlock, it is wired for the stop side (BindsTo) but not the start side (WantedBy). The recommended setup wires the share into the full pool lifecycle – see Binding shares to the pool lifecycle.
Related
- Recovery scenarios – detailed recovery walkthroughs
- NixOS configuration – module option reference
- Monitoring and alerts – alert system details; see “Scrub reported errors” above for the post-alert investigation steps.
Recovery scenarios
Detailed walkthroughs for recovering from failures. Read this when braid status or another command tells you something is wrong, or when you are planning for failure ahead of time.
Overview: discover vs recover
braid has two recovery commands that solve different problems:
| Command | When to use | What it does |
|---|---|---|
braid discover --write | pool.json is missing or corrupted | Scans disk labels to rebuild pool.json |
braid recover | pending-op.json exists (interrupted mutation) | Opens pool, probes live topology, rebuilds pool.json, and clears the journal after the idle/no-paused recovery path succeeds; preserves the journal when owed RAID1 replay finds a paused, running, or unknown balance state |
discover solves metadata loss – the CLI’s record of which disks belong to the pool is gone, but the disks themselves are fine. It reads LUKS labels (braid-<name>) and LUKS UUIDs from /dev/disk/by-id/ devices to reconstruct UUID-keyed membership.
recover solves interrupted operations – an add, remove, remove-missing, or replace was killed mid-flight (power loss, crash, OOM). The pending-operation journal (/var/lib/braid/pending-op.json) records what was in progress. Recover opens the pool, inspects what actually happened on disk, and rebuilds pool.json to match reality.
Lost pool.json
Symptom: braid unlock fails because /var/lib/braid/pool.json does not exist.
Cause: Accidental deletion, filesystem corruption, or migrating to a new NixOS install.
Steps
- Verify no pending operation exists:
ls /var/lib/braid/pending-op.json
# If this file exists, use `braid recover` instead (see below)
- Scan for braid disks:
sudo braid discover
Output looks like:
toshiba1 = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_XXXX
toshiba2 = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_YYYY
toshiba3 = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_ZZZZ
- Verify the output matches your expected pool members. Then write:
sudo braid discover --write
This creates /var/lib/braid/pool.json.
If you can name the expected member count ahead of time, record it from your
own records or prior braid status output and pass it as a fail-closed guard:
EXPECTED=3
sudo braid discover --write --expect-count="$EXPECTED"
- Unlock normally:
sudo braid unlock
Notes
- For a healthy UUID-keyed
pool.json,discover --writerefuses – usebraid add/braid remove/braid replaceto mutate membership instead. - For a corrupt or off-schema existing
pool.json,discover --writerebuilds in place; no manual remove step is needed. The original bytes are preserved atpool.json.corrupt-<RFC3339-UTC>adjacent to the new file in case manual forensic recovery is needed (e.g. extracting adevidfor anull_underlyingmember). The snapshot is a hard precondition: if it cannot be written (full disk, read-only state directory),discover --writerefuses rather than destroy the corrupt original; free disk space or fix permissions and retry. discover --writerefuses to run ifpending-op.jsonexists. Usebraid recoverinstead. (Barediscoveris read-only and runs regardless.)discoveronly finds LUKS2 devices. LUKS1 devices with braid labels are skipped with a warning.- The rebuilt
pool.jsonis keyed by LUKS UUID. Disk names are stored in each member value for command input and display. - When multiple
/dev/disk/by-id/symlinks point to the same device, discover picks the most stable one (wwn > nvme > scsi > ata > usb).
Interrupted add/remove/replace
Symptom: braid commands fail with a message about a pending operation. ls /var/lib/braid/pending-op.json confirms the journal file exists.
Cause: A pool mutation (add, remove, remove-missing, replace) was interrupted before it could complete. The journal records the operation type, the pre-operation membership, and the target membership. Existing-pool add journals also record a phase: PoolMutation for unfinished disk preparation or btrfs membership, and PostAddBalanceRaid1 after membership is committed but balance work remains.
Steps
- Preview what recover will do:
sudo braid recover --dry-run
This shows the recovery plan without making changes: which LUKS devices will be opened, whether the pool needs mounting, and the final pool.json state.
- Run recovery:
sudo braid recover
Recover will:
- Open the LUKS devices needed for the journal phase
- Mount the btrfs pool
- Probe the live btrfs topology to determine what actually happened
- For existing-pool add
PoolMutation, first open and scan any already-committed journaled add targets that can be reconciled without wiping or adding - For add
PoolMutation, finish only the journaled add targets that are not already live - For add
PostAddBalanceRaid1, skip all disk preparation and btrfs add steps, then run the owed RAID1 balance only when btrfs balance state is idle; preservepending-op.jsonwhen a paused, running, or unknown balance state requires manual inspection - Rebuild or repair pool.json only when live membership is complete
- Clear pending-op.json only after required membership and balance work is complete
- Verify:
sudo braid status
Interrupted between returned-disk wipe and add
If an existing braid-labeled disk was being returned to the pool and the add was interrupted after wipefs --types btrfs but before btrfs device add, run:
sudo braid recover
Recover replays the add from the journaled returned-disk target. Do not wipe the disk and retry it as a fresh add; the journal still records the checked LUKS identity and expected pool FSID.
Interrupted fresh-disk add
For an interrupted fresh-disk add, recover replays the format, optional keyfile enrollment, LUKS header backup, mapper open, and btrfs device add from the journaled options when the disk is present.
If the disk is absent or has a different LUKS label than the journal records, recover fails and leaves pending-op.json in place. Reconnect the original disk or replace the target, then rerun sudo braid recover.
Pending-op file corruption
Symptom: braid reports that /var/lib/braid/pending-op.json could not be parsed.
The remediation phrase is:
Remove /var/lib/braid/pending-op.json after manual reconciliation (see docs/internals/luks-unlock.md) and re-run.
It is safe to remove pending-op.json only when one of these is true:
| Situation | Safe? |
|---|---|
No disk-level mutation committed: no LUKS format, no btrfs device add, no cryptsetup open of a fresh-format target | Yes |
braid status confirms the live pool already reflects the intended state and the journal is stale | Yes |
A mutation is partially complete, such as mkfs.btrfs run but btrfs device add did not, or replace is paused mid-rebuild | No |
When it is not safe, keep the journal in place and investigate the interrupted operation before editing state.
Out-of-band reformat during recovery
Recover refuses if a target disk’s live LUKS UUID no longer matches the journal. This catches a disk that was reformatted, swapped, or cloned after the original operation started.
Messages to search for:
add recovery aborted: target ... LUKS UUID mismatchrecover replace target '...' LUKS UUID mismatch: expected ..., found ...
Do not force the journal forward. Investigate the foreign reformat or swapped disk, restore the intended disk if possible, and rerun recovery.
See also Unlock refused by a foreign or mismatched disk for the same identity check on the braid unlock path.
Never-enriched member with null-underlying mapper
A member can be known to btrfs by devid while its LUKS backing device is gone (cryptsetup status reports device: (null)). If the member was never enriched with a persisted devid, recovery cannot bind that null-underlying mapper back to a UUID-keyed membership entry.
Let braid recover complete when it can preserve the member. The next read-side command observes the live devid and braid remove-missing becomes available again if the device is truly gone.
Duplicate or missing devid in journal snapshot
Recovery may refuse with internal errors equivalent to duplicate journaled devids or no member for a journaled devid. This means the journal snapshot cannot safely resolve a btrfs devid to a UUID-keyed member.
Do not edit pool.json; that resolution did not consult it. Re-run recovery only after manual reconciliation of pending-op.json.
Committed-but-closed add target
If the journaled add target is already a live pool member but its mapper is closed when recover starts, recover opens and scans it during the reconciliation pass. After the live-pool re-probe, the target is included in pool.json and is not re-added.
This can still prompt for the pool passphrase even when the pool is already mounted, because the target mapper may need to be opened before recover can discover that it already committed.
With missing devices
If a drive failed during the interrupted operation:
sudo braid recover --allow-degraded
Without --allow-degraded, recover exits with code 2 when devices are missing. The degraded flag allows mounting with missing devices so recovery can complete. Redundancy is reduced until the missing device is replaced.
Scripted recovery
For unattended recovery (e.g. from a remote script):
echo "my-passphrase" | sudo braid recover --passphrase-stdin
Or with a passphrase file:
sudo braid recover --passphrase-file /path/to/passphrase
Recover for a replace journal when the pool is already mounted
Symptom: sudo braid recover exits with recover refuses to probe an already-mounted pool when the journal records a replace ... and instructs you to run braid lock first.
Cause: The pool was mounted by something other than braid recover itself (typically a manual cryptsetup open + mount after a crash, since braid unlock and braid-auto-unlock.service both refuse to mount when a pending-op journal exists). For a replace journal, the kernel may have resumed an interrupted dev_replace on that mount session, leaving stale in-memory device state that recover cannot distinguish from real topology. The cycle that scrubs this state needs to unmount and remount, which is unsafe on a mount recover does not own.
Steps
sudo braid lock # works with a journal present -- no pending-op preflight
sudo braid recover # opens its own mount and runs the relock cycle
braid lock unmounts the pool and closes the LUKS mappers. braid recover then opens a fresh mount session, finishes any in-progress kernel dev_replace, and runs the umount-and-remount cycle that clears stale btrfs_fs_devices – the standard happy path for replace recovery.
Unlock refused by a foreign or mismatched disk
Symptom: braid unlock exits with LUKS UUID mismatch. A disk at a recorded by-id slot reports a LUKS UUID that differs from the one in pool.json; the error names the disk, its by-id path, and the expected vs found UUID.
Cause: The disk was swapped, cloned, or reformatted out of band, so its LUKS identity no longer matches the recorded member. This is a hard refusal during probing, before any mapper opens. --allow-degraded does not bypass it – that flag only covers missing disks, and this disk is present.
If the swap was unintended
Detach the foreign disk and reattach the original. braid unlock then succeeds.
If the swap was intentional
braid replace requires the pool mounted, but the present mismatched disk blocks the mount. Make the slot read as missing first, then replace:
- Detach the foreign disk so the member reads as absent.
- Mount the pool degraded:
sudo braid unlock --allow-degraded - Replace the now-missing member following Missing disk -> Option A: Replace the disk.
braid replaceprepares its own--newdisk; seebraid replacefor how it handles a disk that already carries a LUKS header.
See also Out-of-band reformat during recovery for the same identity check on the braid recover path (a different trigger).
Missing disk (drive failure)
Symptom: braid status shows a device as missing. The pool may be mounted degraded or may refuse to mount.
Unlock with a missing disk
If the pool is not mounted:
sudo braid unlock --allow-degraded
This mounts the pool in degraded mode. All data is still accessible (btrfs RAID1 keeps a copy on the surviving disk(s)), but the pool is running with reduced redundancy until you replace the dead drive.
Hot-unplug while pool is mounted
If a drive is physically disconnected while the pool is mounted, its LUKS
mapper can remain open with cryptsetup status reporting device: (null).
btrfs continues to list the devid but has not yet promoted it to MISSING.
braid status reports the devid – it contributes to missing_count and
appears in missing_devids – but braid remove-missing --missing-id N and
braid replace (with or without --missing-id) refuse the devid because they
only act on btrfs-authoritative MISSING entries.
To make progress:
- Confirm the disk is truly gone (not just a loose cable).
- Relock and re-unlock the pool degraded so btrfs re-evaluates membership and
promotes the devid:
sudo braid lock sudo braid unlock --allow-degraded - Re-run
braid status– the devid should now appear as authoritatively MISSING – then retrybraid remove-missingorbraid replace.
Option A: Replace the disk
Replaces the dead disk with a new one, rebuilding data from surviving copies:
sudo braid replace --old toshiba2 \
--new toshiba4=/dev/disk/by-id/ata-NEW_DRIVE_SERIAL
--old identifies the missing member. If you want to cross-check the btrfs
devid from braid status, add --missing-id after the required args:
sudo braid replace --old toshiba2 \
--new toshiba4=/dev/disk/by-id/ata-NEW_DRIVE_SERIAL \
--missing-id 3
Replace runs btrfs replace start -B under the hood. braid replace is a long-running online operation: the command waits in the foreground and shows progress while the pool remains usable. It can take hours for large drives, so run it from a shell you can leave open (or a tmux/screen session). From another shell, braid status and braid tui can show progress independently.
Option B: Remove the missing device
Forgets the dead device without rebuilding data:
# Find the missing device's btrfs devid from braid status
sudo braid remove-missing --missing-id 3
Use this when you do not have a replacement disk. The pool continues with fewer disks and reduced capacity. Data that was only on the dead drive is lost (but in RAID1, all data has a second copy on another drive).
When this clears the last missing device and 2+ disks remain, remove-missing blocks on a follow-up soft RAID1 balance to restore redundancy on chunks written as single during degraded operation. You will see [wait] pool: restoring RAID1 redundancy... then [ok] pool: RAID1 redundancy restored before the command returns. The wait scales with how much data was written while degraded: an idle pool finishes in seconds, while a pool written to heavily during degraded mode can take longer. A sleep inhibitor is held for the entire operation. See braid remove-missing for the full sequence.
Verify:
sudo braid status
A successful result shows no missing devices and no single profile rows for data or metadata.
Choosing between replace and remove-missing
replace | remove-missing | |
|---|---|---|
| Requires new disk | Yes | No |
| Rebuilds data | Yes | No |
| Restores redundancy | Yes | Partial: restores RAID1 profiles when 2+ disks remain, but does not add replacement capacity |
| Duration | Hours (large disks) | Minutes |
| When to use | You have a replacement | No replacement available |
Degraded mount
A degraded mount means at least one pool disk is missing. The pool is usable but the pool is running with reduced redundancy on the missing device’s share of data.
When degraded mounts happen
braid unlock --allow-degraded– explicit requestbraid recover --allow-degraded– recovery with missing devicesbraid.autoUnlock.allowDegraded = true– auto-unlock config
Risks
- Reduced redundancy – the pool is short the missing device’s mirror copy of existing data, and on 2-disk pools new writes are allocated as single-profile chunks. A further drive failure could lose data.
- No self-healing – btrfs cannot repair corrupted blocks from a redundant copy if the copy was on the missing device.
Resolution
Replace the missing disk as soon as possible:
sudo braid replace --old <missing-name> \
--new <new-name>=/dev/disk/by-id/<new-drive>
After replace completes, the pool is fully redundant again.
Recovery decision tree
braid command fails
├── "pending operation" error
│ └── braid recover [--allow-degraded]
├── pool.json missing
│ └── braid discover --write → braid unlock
├── "LUKS UUID mismatch" error
│ └── see "Unlock refused by a foreign or mismatched disk"
├── missing device / won't mount
│ ├── braid unlock --allow-degraded
│ └── then: braid replace or braid remove-missing
└── something else
└── braid doctor → check troubleshooting guide
State files reference
All state lives under /var/lib/braid/:
| File | Purpose |
|---|---|
pool.json | UUID-keyed pool membership; each value stores disk name, by-id path, prior devid, and added-at timestamp |
pending-op.json | UUID-keyed pending operation journal (present only during mutations) |
acked-stats.json | Acknowledged btrfs device stats baseline |
smartd-alert | Flag file set by smartd alert script |
alert-latch.json | Active alert state |
luks-headers/ | LUKS header backups |
Related
- Troubleshooting – symptom-oriented quick fixes
- NixOS configuration –
autoUnlock.allowDegradedand other options
braid add
Add one or more disks to the braid pool. Creates a new pool if none exists, or expands an existing one.
When to use it
- Setting up a new NAS (bootstrap with one or more disks)
- Expanding storage by adding a new drive to an existing pool
Basic example
sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234
Common variations
Bootstrap a new pool with two disks (creates RAID1 immediately):
sudo braid add \
toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 \
toshiba2=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_5678
Add a disk to an existing pool:
sudo braid add toshiba3=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_9012
Preview what would happen without making changes:
sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --dry-run
Skip the confirmation prompt (for scripting):
sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --yes
Pass passphrase non-interactively:
echo -n 'hunter2' | sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --passphrase-stdin
sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --passphrase-file /tmp/pass.txt
Enroll a keyfile for auto-unlock from a mounted USB drive at the same time:
sudo braid add toshiba1=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234 --enroll /mnt/usb
Mount the USB first so the --enroll directory refers to removable media,
not persistent host storage.
Important flags
| Flag | Purpose |
|---|---|
--dry-run | Show what would happen without executing |
--yes | Skip interactive confirmation |
--passphrase-stdin | Read passphrase from stdin instead of TTY prompt |
--passphrase-file <path> | Read passphrase from a file (conflicts with --passphrase-stdin) |
--enroll <dir> | Enroll braid.key from this directory into LUKS slot 1 on each adopted disk – fresh or returning – whose slot 1 is empty; idempotent skip if the keyfile already authenticates slot 1 |
--luks-format-arg=<ARG> | Advanced: pass one raw argument to cryptsetup luksFormat, repeated once per argument; always use the equals form (e.g. --luks-format-arg=--pbkdf). braid refuses flags it manages itself – identity, key-material, integrity, and on-disk-layout options such as --uuid, --label, --type, --key-file, and offset/sizing flags. |
--progress auto|always|never | Control progress display (default: auto) |
Disk spec format
Each disk is specified as NAME=PATH, where:
- NAME is a short label you choose (e.g.
toshiba1) - PATH is the
/dev/disk/by-id/stable device path
The name is stored in pool.json and used in LUKS mapper names (braid-toshiba1), LUKS labels, and all future commands. The persistent member identity is the LUKS UUID, not the name.
What happens under the hood
-
Probes each disk to determine its state (fresh, braid-labeled, or foreign)
-
Shows a confirmation prompt with the disk’s name and by-id path, plus its model/size/serial (best-effort from the live device via lsblk – omitted if unavailable)
-
For fresh disks: pre-generates a LUKS UUID, LUKS-formats the disk with the pool passphrase and
braid-<name>label, enrolls the--enrollkeyfile into slot 1 if provided, creates a LUKS header backup, and opens the LUKS mapperSee Pending LUKS header backups – copy each
.luksheaderoff-system and delete the local copy. -
If no pool exists: creates a btrfs filesystem (RAID1 if 2+ disks, single if 1 disk; braid explicitly pins the
block-group-treefeature bit so that bit is visible and stable across toolchain versions – see ADR-027) -
If a pool exists: writes a phased UUID-keyed journal, adds the device to the existing btrfs filesystem, records the new membership in pool.json, then advances the journal to the balance phase
-
If the pool now has 2+ disks: balances data to RAID1, then clears the journal – unless the pool has a missing device, in which case the balance is skipped (a
[skip]note explains why). Redundancy is restored later byremove-missingorreplace, not by the degraded add.
Keyfile enrollment (--enroll DIR): braid enrolls braid.key into LUKS slot 1 on every adopted disk – fresh or returning. On a fresh disk slot 1 is always empty, so the keyfile is always added. On a returning braid disk braid first probes the keyfile: if it already authenticates slot 1 the enrollment is an idempotent skip with no slot change and no new header backup; if slot 1 is empty the keyfile is added. (If slot 1 holds a different, unknown key braid refuses – see Safety checks.) The keyfile is added before the header backup so the backup captures slot 1.
A sleep inhibitor is held during all irreversible operations to prevent the system from suspending mid-operation.
If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).
Disk acceptance rules
braid classifies each disk before acting:
- Fresh disk (no LUKS): accepted. LUKS-formatted with the pool passphrase.
- Returning braid disk (braid-labeled LUKS, btrfs FSID matches the pool): accepted as a recovery add. The disk is re-joined to the pool without reformatting. If the old btrfs signature would make
btrfs device addrefuse the disk, braid first runs the narrow wipewipefs --all --types btrfson the verified mapper, then usesbtrfs device add -f. - Non-braid LUKS: refused. braid will not adopt a LUKS device it did not create.
- Braid-labeled, wrong pool: refused. The disk belongs to a different btrfs filesystem.
- Braid-labeled, no btrfs superblock: refused. The disk’s identity is ambiguous (could be partial init, clean eviction, or manual wipe). Wipe the disk and add as fresh.
- Braid-labeled, pool not mounted: refused during bootstrap. Identity cannot be verified without a mounted pool.
Safety checks / refusal cases
- Rejects duplicate disk names in the same command
- Rejects disks that conflict with existing pool membership (same LUKS UUID, same name, or same by-id path)
- Rejects absent disks (not plugged in)
- Verifies the passphrase against an existing pool member before formatting new disks
- Warns if the pool has missing devices but does not refuse:
braid addstill adds the new disk, but skips the RAID1 convert balance (it surfaces a[skip]note), so the pool stays degraded and redundancy is not restored at add time. To repair, either runbraid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>to swap in a new disk for the missing member, or – on a 2-disk degraded pool whereremove-missingalone would refuse (it cannot drop RAID1 below two devices) – runbraid addthenbraid remove-missingto drop the dead member and rebalance onto the new disk. - Warns if existing pool drives have a keyfile but
--enrollwas not passed - With
--enroll, refuses if an adopted disk’s LUKS slot 1 is occupied by an unknown key the keyfile does not authenticate – remove it first withcryptsetup luksKillSlot, then retry. - Refuses if a pending operation journal (
pending-op.json) exists – runbraid recoverto reconcile. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
- Refuses when UPS support is enabled and
braid ups statuscannot verify a trustedOL(utility-power) state.
Interrupted adds
Existing-pool adds recover in two phases:
- PoolMutation: disk preparation and btrfs membership are not fully committed yet.
braid recovermay finish formatting a fresh target, re-open a verified returned target, run the narrow btrfs-signature wipe for that returned target, and runbtrfs device add. - PostAddBalanceRaid1: membership and
pool.jsonare committed.braid recoverwill not format, wipe, or add disks in this phase; it only mounts/probes the committed pool, repairspool.jsonfrom the committed live topology if needed, runs the owed RAID1 balance when btrfs balance state is idle, and clearspending-op.jsonafter that succeeds. A paused, running, or unknown balance state fails closed with the journal preserved.
Related commands
- braid status – check pool health and disk info
- braid remove – remove a live disk from the pool
- braid replace – replace a dead or live disk
- braid unlock – open LUKS devices and mount the pool
Related guides
- Getting started – initial setup walkthrough
braid remove
Remove a live disk from the pool. Data migrates off the disk before it is detached.
When to use it
- Shrinking the pool (retiring a drive you no longer need)
- Removing a drive that is still online and healthy
If the disk is already dead or missing, use braid replace to rebuild data onto a new disk, or braid remove-missing to forget the entry without rebuilding.
Basic example
sudo braid remove toshiba3
Common variations
Preview what would happen:
sudo braid remove toshiba3 --dry-run
Skip the confirmation prompt:
sudo braid remove toshiba3 --yes
Important flags
| Flag | Purpose |
|---|---|
--dry-run | Show what would happen without executing |
--yes | Skip interactive confirmation |
--progress auto|always|never | Control progress display (default: auto) |
What happens under the hood
- Probes the pool to verify the disk is a live member
- Checks that remaining disks have enough free space to absorb the data being migrated
- Shows a confirmation prompt with the disk’s name and devid, its model/size/serial (best-effort from the live backing device via lsblk – omitted if unavailable), and the resulting disk count (e.g.
Pool: 3 disks -> 2 disks) - If removing the second-to-last disk (going from 2 to 1): first balances the pool from RAID1 to single profile, then removes the device
- Runs
btrfs device removeto migrate all data off the disk (this is the long-running step) - Closes the LUKS mapper on the removed disk
- Updates pool.json to remove the member’s UUID entry
A sleep inhibitor is held during data migration and cleanup.
If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).
Safety checks / refusal cases
- Refuses if the pool is not mounted
- Refuses if the named disk is not a live member of the pool (suggests
braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>orbraid remove-missingif missing devices are detected) - Refuses to remove the last disk from the pool
- Refuses if there are missing devices in the pool (resolve those first)
- Refuses if remaining disks lack space to absorb the removed disk’s data (ENOSPC pre-flight)
- Warns when removal leaves a single disk (no RAID1 redundancy)
- Refuses if a pending operation journal (
pending-op.json) exists – runbraid recoverto reconcile. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
- Refuses when UPS support is enabled and
braid ups statuscannot verify a trustedOL(utility-power) state.
Related commands
- braid status – see which disks are in the pool and their devids
- braid remove-missing – forget a dead/missing device entry
- braid replace – replace a disk (live or dead) with a new one
- braid add – add a disk to the pool
braid remove-missing
Forget a stale missing-device entry from the pool. This does NOT rebuild data – use braid replace for that.
When to use it
- A disk has permanently failed and you want to clean up the pool metadata without replacing it
- You have already recovered your data and just need btrfs to stop reporting the missing device
This is a destructive choice: any data that only existed on the missing disk is lost. If you want to rebuild data onto a new disk, use braid replace instead.
Basic example
Note: braid remove-missing operates only on btrfs-authoritative MISSING
devids. A drive that is hot-unplugged while the pool is mounted contributes to
missing_count and appears in missing_devids in braid status before btrfs
promotes its devid to MISSING; remove-missing refuses the devid with a
specific hot-unplug diagnostic until that promotion happens. See
Hot-unplug while pool is mounted.
First, find the missing device’s ID:
sudo braid status
Then remove it:
sudo braid remove-missing --missing-id 3
Common variations
Preview what would happen:
sudo braid remove-missing --missing-id 3 --dry-run
Skip the confirmation prompt:
sudo braid remove-missing --missing-id 3 --yes
Important flags
| Flag | Purpose |
|---|---|
--missing-id <devid> | Target missing device by btrfs devid (required) |
--dry-run | Show what would happen without executing |
--yes | Skip interactive confirmation |
--progress auto|always|never | Control progress display (default: auto) |
What happens under the hood
- Probes the pool to verify missing devices exist
- Validates that the specified devid is actually a missing device (not a live one)
- Resolves the btrfs devid to the UUID-keyed pool member whose persisted prior devid matches
- Shows a confirmation prompt with the disk name, devid, and the resulting disk counts (e.g.
Pool: 2 present + 1 missing -> 2 disks) - Writes a
PoolMutationjournal and runsbtrfs device remove <devid>to clear the missing device entry - Updates pool.json to remove the member’s UUID entry, then advances the journal to post-remove-missing maintenance
- If this was the last missing device and 2+ disks remain: runs a soft RAID1 balance (
-dconvert=raid1,soft -mconvert=raid1,soft) to restore redundancy on any single-profile chunks created during degraded operation - Clears the journal
A sleep inhibitor is held during the removal and the subsequent soft balance (if triggered).
If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).
Safety checks / refusal cases
- Refuses if the pool is not mounted
- Refuses if no missing devices are detected
- Refuses if the specified devid belongs to a live device (use
braid removefor that) - Refuses if the specified devid is not a device in this pool
- Refuses if surviving disks lack space to absorb the missing device’s
allocations (ENOSPC pre-flight), or if that pre-flight cannot run (the
btrfs device usageprobe failed to spawn, returned a nonzero exit, produced unparseable output, did not list the targeted missing devid, or reported an untrusted missing-device allocation shape: the targeted devid is listed more than once, carries an allocation profile braid does not model, or reports no positive Data/Metadata/System RAID1 row), when more than 1 surviving device exists - Refuses on a 2-disk RAID1 pool with one disk missing – the kernel refuses to drop a RAID1 pool below two devices. Use
braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>to repair the dead disk, orbraid addfirst and then re-run. - Refuses if a pending operation journal (
pending-op.json) exists – runbraid recoverto reconcile. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
- Refuses when UPS support is enabled and
braid ups statuscannot verify a trustedOL(utility-power) state.
Related commands
- braid status – find missing device IDs
- braid replace – replace a missing disk with a new one (rebuilds data)
- braid remove – remove a live disk
braid replace
Replace a disk with a new one using btrfs replace. Works for both live (still-online) and dead/missing disks.
When to use it
- A disk has failed and you need to rebuild data onto a replacement
- Proactively swapping a healthy disk for a larger or newer one
Basic example
The same invocation replaces a disk whether it is still live or already
dead/missing. braid resolves --old against pool.json to find the member and
detects its state automatically, so there is no mode to choose and --missing-id
is never required:
sudo braid replace --old toshiba1 --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1
Common variations
Note: braid replace operates only on btrfs-authoritative MISSING devids. A
drive that is hot-unplugged while the pool is mounted contributes to
missing_count and appears in missing_devids in braid status before btrfs
promotes its devid to MISSING; both an explicit --missing-id cross-check
and the no-flag auto-resolve path refuse the devid with a specific hot-unplug
diagnostic until that promotion happens. See
Hot-unplug while pool is mounted.
Optionally assert which missing devid you expect (braid refuses if it disagrees with pool.json):
sudo braid replace \
--old toshiba1 \
--new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 \
--missing-id 3
Preview what would happen:
sudo braid replace --old toshiba1 --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 --dry-run
Enroll a keyfile from a mounted USB drive on the new disk:
sudo braid replace \
--old toshiba1 \
--new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 \
--enroll /mnt/usb
Mount the USB first so the --enroll directory refers to removable media,
not persistent host storage.
Pass passphrase non-interactively:
sudo braid replace --old toshiba1 --new toshiba4=/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_NEW1 --passphrase-file /tmp/pass.txt
Important flags
| Flag | Purpose |
|---|---|
--old <name> | Name of the disk to replace |
--new <name>=<path> | Name and by-id path of the replacement disk |
--missing-id <devid> | Optional cross-check for a dead-disk replace: assert the missing btrfs devid. braid refuses if it disagrees with the devid pool.json records for –old. Never required. |
--enroll <dir> | Enroll braid.key from this directory into LUKS slot 1 on the new disk |
--dry-run | Show what would happen without executing |
--yes | Skip interactive confirmation |
--passphrase-stdin | Read passphrase from stdin |
--passphrase-file <path> | Read passphrase from a file (conflicts with --passphrase-stdin) |
--luks-format-arg=<ARG> | Advanced: pass one raw argument to cryptsetup luksFormat, repeated once per argument; always use the equals form (e.g. --luks-format-arg=--pbkdf). braid refuses flags it manages itself – identity, key-material, integrity, and on-disk-layout options such as --uuid, --label, --type, --key-file, and offset/sizing flags. |
--progress auto|always|never | Control progress display (default: auto) |
What happens under the hood
For a fresh replacement disk (no LUKS):
- Pre-generates the replacement member’s LUKS UUID and LUKS-formats the new disk with the pool passphrase and a
braid-<name>label - Optionally enrolls a keyfile in slot 1
- Creates a LUKS header backup
- Opens the LUKS mapper
Then, for all replacements:
- Runs
btrfs replace startto copy data from the old device (or its mirrors) to the new device - Writes committed UUID-keyed membership to
pool.jsonand advances the journal to post-replace maintenance - For live replacements: closes the old disk’s LUKS mapper
- Resizes the new device to use its full capacity (important when the new disk is larger)
- For missing-disk replacements that clear the last missing device: runs a soft RAID1 balance to restore redundancy on any single-profile chunks
- Clears the journal
The fresh-disk path always produces a local LUKS header backup in step 3; the existing-LUKS path produces one only when --enroll actually adds slot 1, so an already-enrolled disk is a no-op with no new backup. See Pending LUKS header backups – copy each .luksheader off-system and delete the local copy.
A sleep inhibitor is held throughout the replace to prevent the system from suspending. Suspending mid-replace can corrupt the btrfs topology.
If a btrfs exclusive operation (a running balance, device add/remove/replace, resize, or swap activate) is already in flight on the pool, braid does not fail – its btrfs commands queue behind the in-flight operation (via --enqueue) and the kernel runs them when the pool is free. A paused balance is the exception and is refused (see Safety checks below).
Safety checks / refusal cases
- Refuses if the pool is not mounted
- Refuses if
--oldand--neware the same disk - Refuses if the new disk’s LUKS UUID is already in use by the pool (registered membership or live btrfs devices) – detach the conflicting disk before retrying
- Refuses if the new disk is absent (not plugged in)
- Refuses if the new disk’s mapper capacity is smaller than the source disk’s btrfs
total_bytes(read viaBTRFS_IOC_DEV_INFO, the same valuebtrfs replace startcompares against). For existing LUKS targets, mapper capacity is derived from the LUKS2 segmentoffsetandsize(dynamicmeansraw - offset, fixed means the segment size). For fresh-LUKS targets, braid uses cryptsetup’s default 16 MiB offset; offset-affecting--luks-format-argflags (--offset/-o,--align-payload,--luks2-metadata-size,--luks2-keyslots-size,--sector-size) are rejected for this reason. - For live replacements: refuses if the pool has missing devices (resolve those first)
- For missing replacements: refuses if
--missing-idpoints to a live device - For missing replacements: refuses if
--missing-iddisagrees with the devid pool.json records for--old(--oldalready identifies which member to rebuild) - For missing replacements: refuses if pool.json has no recorded devid for
--old–--missing-idcannot substitute, it must match the recorded devid - Verifies the passphrase against an existing pool member before formatting
- Warns before confirmation and in
--dry-runif the live source device has I/O errors (informational, does not block) - Warns if existing pool drives have a keyfile but
--enrollwas not passed - Refuses if a pending operation journal (
pending-op.json) exists – runbraid recoverto reconcile. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses if a btrfs balance is paused on the pool – resume or cancel it first. A paused balance holds the exclusive-operation lock indefinitely, so braid cannot wait it out.
- Refuses when UPS support is enabled and
braid ups statuscannot verify a trustedOL(utility-power) state.
Related commands
- braid status – find device IDs and see which disks are missing
- braid remove-missing – forget a dead device without replacing it
- braid add – add a new disk (without replacing an existing one)
braid unlock
Open LUKS devices and mount the btrfs pool.
When to use it
- After a boot, to unlock and mount the pool
Unless you’ve configured braid to automatically unlock on boot (braid.autoUnlock), you must use braid unlock to mount and access the pool.
Basic example
sudo braid unlock
You will be prompted for the pool passphrase.
Common variations
Pass passphrase non-interactively (useful for scripts or remote unlock):
echo -n 'hunter2' | sudo braid unlock --passphrase-stdin
sudo braid unlock --passphrase-file /tmp/pass.txt
Unlock with a binary keyfile from a mounted USB drive (e.g. for auto-unlock via systemd):
sudo braid unlock --key-file /mnt/usb/braid.key
Mount the USB first so the keyfile path refers to removable media, not persistent host storage.
Mount in degraded mode when a disk is missing:
sudo braid unlock --allow-degraded
Preview what would happen:
sudo braid unlock --dry-run
Important flags
| Flag | Purpose |
|---|---|
--passphrase-stdin | Read passphrase from stdin instead of TTY prompt |
--passphrase-file <path> | Read passphrase from a file (conflicts with --passphrase-stdin) |
--key-file <path> | Unlock with a binary keyfile instead of passphrase (conflicts with passphrase flags) |
--allow-degraded | Allow mounting with missing devices (degraded mode) |
--dry-run | Show what would happen without executing |
What happens under the hood
- Checks that no other braid operation is pending
- Probes each UUID-keyed member in pool.json: checks whether the by-id device is present, whether its LUKS UUID matches, and whether its LUKS mapper is already open
- Verifies the selected credential against every disk it will unlock before opening any mapper
- Opens LUKS mappers for all locked disks using the verified credential
- Runs
btrfs device scanto let the kernel discover all pool members - Mounts the btrfs filesystem with
noatime,skip_balance, andsubvolid=5 - If any disks are unavailable and
--allow-degradedis set: mounts with thedegradedoption - After mount: enriches pool.json with live btrfs device IDs and related metadata – best-effort
- Checks for a paused balance and prints a warning if one is found
If all mappers are already open and the pool is already mounted, unlock is a no-op.
On NixOS module installs
After a successful mount, braid unlock activates braid-online.service. Any unit you have wired into the pool lifecycle with WantedBy=braid-online.service (e.g. an SMB or NFS unit – see Sharing and permissions) starts as part of that activation. braid lock stops them again on the way down via the matching BindsTo=braid-online.service.
Standalone CLI installs (no NixOS module) skip this – there is no braid-online.service to activate.
Degraded mode
When a disk is missing (physically absent or with an unreadable LUKS header), unlock refuses to mount by default. The error message names the affected disk and tells you to pass --allow-degraded.
In degraded mode, the pool mounts with reduced redundancy. New writes are NOT mirrored to the missing disk. You should repair the pool as soon as possible with braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>.
The exit code is 2 when a degraded mount is refused (vs. 1 for other errors), so scripts can distinguish the two cases.
Safety checks / refusal cases
- Refuses if a pending operation journal (
pending-op.json) exists – runbraid recoverto reconcile. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses to mount degraded without explicit
--allow-degraded - Refuses if a present disk’s LUKS UUID does not match the UUID recorded in
pool.json– the disk was swapped, cloned, or reformatted out of band. The error names the disk, its by-id path, and the expected vs found UUID. This is a hard error caught during the initial probe, before any mapper opens;--allow-degradeddoes not bypass it (that flag only covers missing disks). If unintended, detach the foreign disk and reattach the original; if the swap was intentional, see Unlock refused by a foreign or mismatched disk. - If any disk rejects the selected credential during verification, unlock fails before opening any mapper and names the failing disk. If another disk already accepted the same credential, that points to disk-specific credential drift outside braid.
- Does not prompt for a passphrase if all mappers are already open (idempotent re-run)
Related commands
- braid lock – unmount the pool and close LUKS mappers
- braid status – check pool health after unlocking
- braid replace – repair a missing disk after degraded unlock
braid lock
Unmount the btrfs pool and close all LUKS mappers.
When to use it
- Before shutting down or rebooting (though systemd handles this automatically)
- When you want to manually take the pool offline
- Before physically removing a disk (after
braid remove)
Basic example
sudo braid lock
Common variations
Preview what would happen:
sudo braid lock --dry-run
Important flags
| Flag | Purpose |
|---|---|
--dry-run | Show what would happen without executing |
What happens under the hood
- Checks if the pool is mounted
- Checks that no btrfs exclusive operation (balance, device remove, etc.) is running. Skipped when the pool is not mounted.
- Unmounts the btrfs filesystem, retrying up to 3 times if the device is busy (covers the brief race after stopping SMB/NFS consumers, where the kernel has not yet released the last file descriptors)
- After a successful unmount, runs
btrfs device scan --forgetfor the planned close-set mappers (member-owned plus any orphanedbraid-*mappers from a prior crash) that still exist on disk, clearing the kernel’s device registry so stale references do not race with mapper close. Skipped when there is nothing left to forget. - Classifies live mappers by LUKS UUID/devid ownership, then closes member-owned observed mapper names, retrying up to 3 times if the device is busy
- Scans for orphaned
braid-*mappers not owned by UUID-keyed membership (e.g. from a prior crash) and closes those too
If the pool is already unmounted and all mappers are already closed, lock reports “pool already locked” and exits cleanly.
On NixOS module installs
When braid is installed via the NixOS module, braid lock also:
- Stops
braid-scrub.timer,braid-scrub-resume-trigger.service, andbraid-scrub.servicebefore unmount. - Stops any consumer wired into the pool lifecycle via
BindsTo=braid-online.service(lock walks its reverse,BoundBy; e.g. an SMB or NFS unit you set up that way – see Sharing and permissions) before unmount. - Stops
braid-online.serviceitself after a successful unmount.
braid unlock reverses the third step: it reactivates braid-online.service after mount, which restarts every consumer that is also WantedBy=braid-online.service. A consumer wired with only one of the two half-works – BindsTo stops it before lock, WantedBy restarts it after unlock – so wire both; the sharing guide shows the full setup.
Standalone CLI installs (no NixOS module) skip all three – there is no braid-online.service or scrub unit to stop.
Error handling
- Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - If unmount fails after 3 retry attempts (e.g. a process has files open on the mount), lock skips
btrfs device scan --forgetand still attempts to close the LUKS mappers, reporting the failure - If a mapper close fails with “device busy” after unmount also failed, the error is downgraded to a warning (the root cause is likely the stuck unmount)
- The hint
lsof <mount_point>orfuser -vm <mount_point>is printed when unmount fails, to help identify the blocking process - If a scanned
braid-*mapper’s backing LUKS UUID cannot be verified (for example because its backing device is gone or its LUKS header is unreadable), lock prints a[warn], leaves that mapper open instead of closing it, excludes it from bothbtrfs device scan --forgetand the close step, and still exits cleanly. Re-runbraid lockonce the mapper’s LUKS UUID is readable. The literalcleanup incompletesummary line appears only under--dry-run; a real run surfaces the per-mapper[warn]and does not printpool already locked. See ADR-024.
Related commands
- braid unlock – open LUKS devices and mount the pool
- braid status – check pool state before locking
- braid idle – check if operations are running before locking
braid seal-mountpoint
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Set the immutable attribute (chattr +i) on the pool mountpoint while it is
unmounted, so a process that writes the path before the pool mounts fails loudly
with EPERM instead of silently landing data on the root filesystem (which the
pool then hides when it mounts over it).
You rarely run this by hand: the NixOS module runs the bare form automatically
from the braid-seal-mountpoint boot/activation unit. The explicit-path forms
are maintenance levers. See
ADR 028.
When to use it
- Re-seal now after the doctor reports the mountpoint is mutable while
offline (instead of waiting for the next boot or
nixos-rebuild switch). - Seal a separate-path subvolume mountpoint the boot seal does not cover
(e.g.
/var/lib/jellyfin/media– see Mounting subvolumes). - Clear an orphaned old mountpoint after changing
braid.mountPoint.
Basic example
Seal the configured mount point (what the boot unit runs):
sudo braid seal-mountpoint
Common variations
Seal a specific directory (e.g. an offline subvolume mountpoint):
sudo braid seal-mountpoint /var/lib/jellyfin/media
Clear the immutable attribute on a path (e.g. an orphaned old mount point):
sudo braid seal-mountpoint --unseal /mnt/old-storage
Important flags
| Flag | Purpose |
|---|---|
PATH | Seal a specific directory instead of the configured mount point |
--unseal | Clear the immutable attribute instead of setting it (requires PATH) |
What happens under the hood
- Opens the path as a directory (a non-directory is refused).
- Confirms the path is not currently a mountpoint (via
statx’sSTATX_ATTR_MOUNT_ROOT, on the same file descriptor, so a racing mount cannot cause it to seal a live filesystem root). A live mountpoint is skipped. - Sets (or, with
--unseal, clears)FS_IMMUTABLE_FLon the bare directory’s inode. The attribute is persistent – it survives unmount and reboot.
The pool mounts normally over a sealed directory; the mounted filesystem’s own root governs writes.
Exit codes
- Bare form (
braid seal-mountpoint) is best-effort and always exits 0 – boot must not fail on it. Problems are logged as warnings to the journal. - Explicit forms (
braid seal-mountpoint <path>/--unseal <path>) report an honest desired-state exit code: exit 0 only if the path actually ends up in the requested state (immutable for seal, mutable for unseal), non-zero otherwise – so a manual seal that silently failed to protect a path is visible.
Error handling
--unsealrefuses the currently configured mount point – it must stay sealed while the pool is offline. Changebraid.mountPointfirst, then unseal the old path.--unsealacquires the pool lock and fails fast if another braid operation is in progress, so it cannot interleave with anunlockthat would remount over the path mid-operation.- A live mountpoint is never touched (the explicit forms report this as a failure).
- If the root filesystem does not support the immutable attribute, the bare form logs one clear warning and protection is unavailable (rare – only non-NAS roots like vfat/9p/nfs).
Related commands
- braid doctor – warns when the offline mountpoint is mutable
- braid lock – take the pool offline (the seal persists across the cycle)
- braid unlock – mount the pool over the sealed directory
braid idle
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Check if the pool has any active operations. Designed for autosuspend integration.
When to use it
- As an autosuspend check to prevent the system from sleeping during a scrub or any btrfs exclusive operation (balance, device add, device remove, device replace, resize, swap activate)
- In scripts that need to wait for the pool to be idle before proceeding
Basic example
sudo braid idle
Output when the pool is mounted and idle:
idle: pool is idle
Output when the pool is not mounted (still exit 0 – nothing to protect):
idle: pool is offline
Exit codes
| Exit code | Meaning |
|---|---|
| 0 | Pool is idle, or pool is offline |
| 1 | Pool is busy (running op) or pool state could not be determined |
| 2 | Setup error – config could not be read |
The busy reason is printed to stdout:
busy: scrub running (45%)
busy: balance running
busy: balance paused
busy: device add in progress
busy: device remove in progress
busy: device replace in progress
busy: resize in progress
busy: swap activate in progress
busy: unknown (<probe>: <error>)
Only the scrub line carries a percentage. The named btrfs operation states
come from scanning /sys/fs/btrfs/*/exclusive_operation, which reports the
active operation but not its progress.
busy: unknown (<probe>: <error>) is printed when a probe failed. The
probe label is mountinfo for /proc/self/mountinfo, sysfs for
/sys/fs/btrfs/*/exclusive_operation, or scrub for btrfs scrub status
command/parser failures. The error text preserves the underlying diagnostic.
When the pool is offline (not mounted), exit code is 0 – there is nothing to protect, so suspend is safe.
braid idle must run as root. A non-root invocation exits 1 with
error: braid must be run as root on stderr before config loading or any probe
runs, with no stdout output. The streams disambiguate this from the documented
exits above: exit 0 prints idle: on stdout, busy/probe-failure exit 1 prints
busy: on stdout, and config-load exit 2 emits a config-error diagnostic.
Autosuspend integration
braid idle is the activity check behind braid’s auto-suspend. You don’t write
this check by hand: set braid.autoSuspend.enable = true and braid’s NixOS
module generates the autosuspend
services.autosuspend ExternalCommand check (BraidPool) for you. The
generated command – bash -c '! timeout -k 2 10 braid idle', with fully
qualified /nix/store paths for bash, timeout, and braid – handles the
exit-code inversion autosuspend expects and a fail-closed inner timeout.
Don’t reproduce it by hand: autosuspend runs the check outside braid’s wrapper,
so bare braid/timeout are not on its PATH.
See the power management guide for setup, and
ADR 016: Auto-Suspend for the
exit-inversion table, the qualified-path requirement, and why timeout must
sit inside the !-inverted command.
What happens under the hood
- Checks if the pool is mounted (via
/proc/self/mountinfo) - If not mounted: returns idle (exit 0)
- Reads
/sys/fs/btrfs/*/exclusive_operationfor any active exclusive operation on any btrfs filesystem:balance,balance paused,device add,device remove,device replace,resize,swap activate - If sysfs reports a busy operation or the sysfs probe fails, returns immediately before probing scrub
- Probes scrub status via
btrfs scrub statusagainst the configured pool mount point, only after the sysfs scan is clean (scrub is not in the kernel exclusive-operation set, so sysfs cannot detect it)
When the host has more than one btrfs filesystem (e.g. a btrfs root in addition to the pool), an exclusive op on any of them keeps the system awake while the pool is mounted, and the busy: line above may name an op on the non-pool fs. This is intentionally conservative – see ADR 016: Auto-Suspend. Scrub detection is narrower: braid idle only checks for a scrub on the braid pool itself, so a scrub running on a non-pool btrfs (e.g. the btrfs root) is not detected and does not block suspend.
Related commands
- braid status – detailed pool state including operation progress
- braid lock – take the pool offline
braid status
Show pool health, per-disk detail, capacity, and operation progress.
When to use it
- After unlocking, to verify everything is healthy
- To check on a running scrub, balance, or replace
- To find device IDs needed by other commands (
--missing-id) - To investigate alerts or degraded state
Basic example
sudo braid status
Common variations
Machine-readable JSON output:
sudo braid status --json
Important flags
| Flag | Purpose |
|---|---|
--json | Output the full status report as JSON |
Output sections
Pool summary
Pool: /mnt/storage
Status: intact
FSID: <uuid>
Profile:
Data: RAID1
Metadata: RAID1
System: RAID1
Status values:
| Status | Meaning |
|---|---|
| intact | All disks present, no issues |
| DEGRADED (N missing devices) | One or more disks are missing; redundancy is reduced on the missing device’s data |
| not mounted | Pool is offline (LUKS closed or not mounted) |
Profile section:
Profile: summarizes btrfs profiles per block-group type. btrfs profiles are
per type, so Data, Metadata, and System can differ; see
btrfs balance profiles for the
background.
| Per-type rendering | Meaning |
|---|---|
RAID1 (also RAID1C3, RAID1C4, RAID10) | Mirrored across drives; reads self-heal from the redundant copy. |
DUP (same-disk copies; no disk redundancy) | Two copies on the same physical device, the default metadata/system profile on a 1-device pool. Survives bit-rot, not device failure. |
single (no redundancy) (also RAID0 (no redundancy)) | One copy across the affected block groups. Checksums detect bit-rot, but corruption cannot be repaired. |
single, RAID1 (not fully redundant) | Block groups for this type span more than one profile, typically after an interrupted balance or degraded writes. Run braid doctor for the right next step; doctor recommends a soft RAID1 balance on a healthy pool and braid replace first on a degraded pool. |
unknown | No block groups of this type were reported. Check braid status advisories for a df probe failure. |
RAID5, RAID6, or any unrecognized name | braid does not classify parity profiles or future btrfs profiles. The raw profile name is shown verbatim with no annotation so the operator can make their own call; braid only ever produces single, DUP, and RAID1. |
The whole Profile: section is omitted when the pool is not mounted or
when btrfs filesystem df failed.
Alert banner
When a health alert is active, a banner appears at the top of the output:
ALERT -- disk health issue detected. Run 'braid ack' to acknowledge and silence.
- btrfs device errors on toshiba1 (devid 1)
- SMART health warning
Alert causes include btrfs device errors, missing devices, and SMART health warnings. Alerts are latched – they persist until acknowledged with braid ack, even if the underlying condition resolves.
Allocation table
Shows how data is distributed across block group types:
Allocation:
Type Profile Used Allocated
Data RAID1 1.20 TiB 1.50 TiB
Metadata RAID1 512.00 MiB 1.00 GiB
System RAID1 64.00 KiB 32.00 MiB
Capacity
Capacity:
Total: 10.91 TiB (Estimated)
Used: 1.20 TiB
Free: 9.50 TiB
For RAID1, the total is estimated as the effective mirrored capacity (not raw disk sum). With mismatched disk sizes, the oversized portion of the largest drive cannot be fully mirrored. The estimate accounts for this.
Total is omitted when the pool is degraded (the estimate would be misleading with missing devices).
Drives (compact listing)
Drives:
toshiba1 sda devid=1 present
toshiba2 sdb devid=2 present
toshiba3 - devid=3 missing
Each row shows the disk name, its short kernel device (e.g. sda), its
btrfs devid, and its state. A disk not assembled into the live pool –
missing, offline, or LUKS-mismatched – shows - for its device.
The devid column shows devid=N only when the live pool currently counts
that device missing: a btrfs-MISSING device, or a hot-unplugged member whose
backing device is gone (null-underlying). It falls back to - when no live
devid exists – a persisted devid the live pool no longer counts missing, or a
member with no recorded devid.
That devid is the input to the braid remove-missing --missing-id and
braid replace workflows. As with the JSON missing_devids field, a transient
hot-unplug devid shown here is refused by both braid remove-missing and
braid replace until btrfs promotes the device to MISSING; see
Hot-unplug while pool is mounted.
State values use the same vocabulary as Per-disk detail
below, rendered lowercase and hyphenated in this compact list (e.g. missing,
offline, luks-uuid-mismatch).
Balance progress
Shown only when a balance is running or paused:
Balance: running, 3/10 chunks (30% complete)
Balance: paused, 5/12 chunks (58% complete)
Last scrub result
Last scrub: Mon Jan 1 00:00:00 2024 (no errors)
Last scrub: Mon Jan 1 00:00:00 2024 (3 errors)
Last scrub: Mon Jan 1 00:00:00 2024 cancelled (will resume)
Last scrub: Mon Jan 1 00:00:00 2024 interrupted
Last scrub: never
Last scrub: running (45%)
A nonzero error count replaces (no errors) with (N errors) on a
finished scrub, and prefixes the cancelled (will resume) and
interrupted lines when a partial scrub recorded errors. When the count
is nonzero, braid appends a copyable kernel-journal query for the
per-error detail lines:
Last scrub: Mon Jan 1 00:00:00 2024 (3 errors)
scrub error details:
sudo journalctl -k --since '2024-01-01 00:00:00' --grep 'BTRFS.*(at logical.*on (dev|mirror)|super block at physical)'
The --since argument is the scrub’s start time. See
Scrub reported errors
for how to read the journal output – including corrected vs. uncorrectable
lines and why the count can exceed the visible journal lines.
Per-disk detail
What each disk shows depends on whether it is a live pool member. A live pool
member shows its device path, model, serial, LUKS UUID, btrfs I/O error
counters (the btrfs: line), and a SMART verdict (the SMART: line). These
last two are different layers: btrfs: is the filesystem’s own I/O accounting,
SMART: is the drive’s self-report. Any other disk – missing, offline, UUID
mismatch, header-unreadable, or unknown – shows a reduced set: its device path
and btrfs: unknown (<reason>) / SMART: unknown (<reason>) lines in place of
counters; a UUID-mismatch disk also shows its observed LUKS: UUID so the
divergence is visible. Separately, any disk that needs attention – for example
a missing disk, or a present member with nonzero error counters – gets an
Action: line naming the next command (detailed below).
Disks:
toshiba1 devid 1 present
Device: /dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234
Model: TOSHIBA MN07ACA12T
Serial: 1234ABC
LUKS: aaaaaaaa-1111-2222-3333-444444444444
btrfs: read 0 / write 0 / flush 0 / corruption 0 / generation 0
SMART: ok
toshiba2 devid 2 present
Device: /dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_5678
Model: TOSHIBA MN07ACA12T
Serial: 5678DEF
LUKS: bbbbbbbb-1111-2222-3333-444444444444
btrfs: read 12 / write 0 / flush 0 / corruption 3 / generation 0
SMART: warning (2 reallocated)
Action: braid replace --old toshiba2 --new <new-name>=/dev/disk/by-id/<...>
toshiba3 MISSING
Device: /dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_9ABC (not found)
btrfs: unknown (device absent)
SMART: unknown (device absent)
Action: braid replace --old toshiba3 --new <new-name>=/dev/disk/by-id/<...>
Disk states (compact Drives: list and detail view):
| State | Meaning |
|---|---|
| present | Disk is online and healthy |
| MISSING | Disk not found at its by-id path |
| OFFLINE | Disk is present and LUKS identity matches membership, but it is not assembled into the live pool |
| LUKS HEADER UNREADABLE | Device present but LUKS header cannot be read |
| LUKS UUID MISMATCH | Device present but its LUKS header UUID differs from the recorded member – swapped, cloned, or reformatted; run braid doctor |
| UNKNOWN | State could not be determined |
btrfs: line. A live, present pool member shows real btrfs counters
(read / write / flush / corruption / generation). Every other disk shows
btrfs: unknown (<reason>), where <reason> names why counters are
unavailable: device absent, LUKS header unreadable, LUKS UUID mismatch,
disk offline -- not in pool, or metadata unavailable. (This line was
labeled Errors: before braid reported SMART; it was renamed to btrfs: so it
reads as a sibling of the SMART: line, not the only error concept.)
SMART: line. A live, present pool member shows the drive’s SMART verdict:
ok, warning, failing, or unknown. When the drive reports an out-of-spec
attribute, the verdict carries a parenthetical listing the concern(s) – e.g.
warning (2 reallocated) or warning (92 percentage used). The parenthetical
follows the evidence, not the verdict word, so a failing drive whose
attributes braid reads as non-nominal also lists them (failing (5 reallocated)); a bare failing/ok/unknown has no evidence to show. Every
non-present disk shows SMART: unknown (<reason>) with the same reasons as the
btrfs: line. The SMART verdict is independent of the btrfs counters: a drive
can report clean btrfs I/O while SMART reads warning, and vice versa.
Action: line. When a disk needs attention, braid status appends an
Action: line naming the next command, so you do not have to look it up:
| Condition | Action: line |
|---|---|
| Missing member, or a present member with nonzero error counts | braid replace --old <name> --new <new-name>=/dev/disk/by-id/<...> |
| Missing or errored device with no pool membership (foreign mapper) | foreign mapper detected -- run 'braid doctor' to investigate |
| LUKS UUID mismatch | disk was swapped, cloned, or reformatted; detach the foreign disk and reattach the original, or run 'braid replace' if the swap was intentional -- run 'braid doctor' for the expected vs observed UUID |
| LUKS header unreadable | run 'braid doctor' for recovery guidance |
Healthy present disks and disks in the OFFLINE or UNKNOWN state get no
Action: line. These hints are human-output only; --json consumers derive
their own remediation from the status and errors fields (the JSON
disks[] element has no action field).
See braid replace to rebuild a missing or failing disk and braid doctor for the guided recovery path.
Advisories
braid status may print one or more warning: lines above the pool
summary. Each warning corresponds to an entry in the JSON advisories
array.
Foreign filesystem at the mount point. When something other than
the braid pool is mounted at the configured mount point (for example, a
stale tmpfs or ext4 mount left by another tool), braid status
reports Status: not mounted and names the actual filesystem type:
warning: /mnt/storage is mounted but fstype is ext4, not btrfs
Unmount the foreign filesystem before retrying braid unlock –
otherwise unlock reports “pool already mounted” because something is
in fact mounted at that path.
Pending recovery journal. When /var/lib/braid/pending-op.json
exists, an interrupted add / remove / remove-missing / replace
is owed. braid status prints the advisory whether or not the pool is
mounted:
warning: interrupted operation detected (pending-op.json exists, started 2026-05-20T10:30:00Z) -- run 'braid recover' to reconcile
Run sudo braid recover to reconcile from live pool state; do not
remove pending-op.json by hand except under the conditions documented
in Pending-op file corruption.
If the journal is unreadable, the advisory carries the canonical
manual-reconciliation phrase instead – because braid recover cannot
load an unparseable journal either:
warning: failed to parse pending-op.json: <detail>. Remove /var/lib/braid/pending-op.json after manual reconciliation (see docs/internals/luks-unlock.md) and re-run.
See Unparseable state-file reconciliation for the safe-to-remove conditions.
Pending LUKS header backups
When a header-mutating operation
(braid add, braid replace, braid enroll) writes a local LUKS
header backup to /var/lib/braid/luks-headers/<disk>.luksheader,
braid status prints a warning until those files are removed:
warning: LUKS header backups exist in /var/lib/braid/luks-headers -- copy offsite and delete local copies
The local copy is a transient byproduct of the header-mutating
operation, not the intended backup target. Copy each .luksheader
file to an off-system location (USB, another machine, cloud key
storage), then remove the local copy to silence the warning.
See LUKS header backup workflow for the full rationale.
ENOSPC risk on RAID1 pool. When an intact mounted pool is one
disk-loss away from insufficient RAID1 chunk-pair space, braid status
prints:
warning: ENOSPC risk: 1 of 3 devices have less than 1.00 GiB unallocated -- if a disk fails, the pool may be unable to allocate RAID1 chunks to restore redundancy. Add capacity with 'braid add', delete unneeded files or snapshots, or compact data chunks with 'btrfs balance start -dusage=50 <mount>' (data only; do not balance metadata).
For 2-disk pools, the warning fires when either device drops below the
threshold because new RAID1 chunks need space on both devices. For 3+
device pools, braid simulates each possible single-disk loss and warns
when any survivor set would have too little pairable unallocated space.
The per-device threshold is min(1 GiB, 10% of total device bytes),
matching btrfs’s effective data chunk size.
See Balance fails with No space left on device for recovery options.
Config-disk probe fault. While building per-disk detail, braid status
probes every configured member’s LUKS header and its expected braid-<name>
mapper. When that probe fails – a braid-<name> mapper hijacked by a foreign
container, a backing-path mismatch, a LUKS1 header, or an
unreadable/unspawnable cryptsetup call – the fault is recorded as an advisory
naming the affected disk:
warning: disk 'disk2' mapper '/dev/mapper/braid-disk2' is open but not backed by the configured disk. Expected LUKS UUID ..., found ...
Unlike the mutating commands (add, replace, enroll), which fail closed on
such a fault, status is the always-available read-only diagnostic: it stays
non-fatal (exit 0) and still prints the full pool summary, capacity, and
per-disk detail. The fault degrades a single member, not the whole report.
A member already live and healthy in the pool keeps its present row – its
identity comes from the LUKS-UUID membership join, which tolerates mapper drift
(decision 024) – and the advisory is its only flag. Only an affected member
that is not live in the pool additionally gets an unknown disk row, so it
is neither silently dropped from the detail section nor mislabeled missing in
the compact Drives: list.
JSON output
--json produces a structured report suitable for monitoring tools. Key fields:
mount_point: the pool’s configured mount path (e.g./mnt/storage) – the same value shown on the human-readablePool:line. Always present, in both the mounted and not-mounted envelopes.status:"intact","degraded", or"not_mounted"total_devices: total number of devices btrfs reports for the pool, as a number. Present when the pool is mounted; omitted in the not-mounted envelope.present_count: number of member devices currently present, equal tototal_devices - missing_count, as a number. Present when the pool is mounted; omitted in the not-mounted envelope.missing_count: number of member devices counted as missing – the cardinality of themissing_devidsarray below (btrfs-MISSING devices plus null-underlying mappers whose backing device disappeared);0on a healthy pool. Present when the pool is mounted; omitted in the not-mounted envelope.fsid: the btrfs filesystem UUID, as a string – the same value shown on the human-readableFSID:line, and distinct from a disk’sluks_uuid. Present when the pool is mounted (a mounted btrfs filesystem always has an FSID); omitted in the not-mounted envelope.disks: array of per-disk reports – one element per disk braid knows about: present pool members (matched members and foreign live devices), plus configured disks that are not currently live pool members (reported asmissing,offline,unknown,luks-header-unreadable, orluks-uuid-mismatch; see thestatusvalues below). The field list below describes a live pool member element (as in the example); diagnostic unpooled elements differ as called out per field and in the note after the example.luks_uuid: the disk’s LUKS UUID – the persistent member identity. For a matched live pool member it equals thepool.jsonmembership key; a foreign live pool device carries an observed UUID that is not in membership (paralleling its mapper-basenamename). Aluks-uuid-mismatchdiagnostic row carries the observed on-disk UUID so the divergence is visible. Other non-live rows (missing,offline,unknown,luks-header-unreadable) report""; correlate them byname, notluks_uuid.name: operator-facing name (e.g.toshiba1). For a matched present member it is resolved via the UUID-keyed membership join; for a foreign present device it falls back to the mapper basename; for a non-present disk it is the configured name. For display/command selection, not identity.by_id: stable/dev/disk/by-id/...hardware path – a runtime handle, not identity.mapper: device-mapper name – a runtime handle, not identity. For a present pool member it is the observed live mapper; for a matched member that is normallybraid-<name>but may have drifted (decision 024 tolerates mapper drift), so do not reconstruct it asbraid-${name}or you will miss the drift. For a non-present disk (missing,offline,unknown,luks-header-unreadable,luks-uuid-mismatch) braid does not report an observed mapper, so it emits the expectedbraid-<name>derived from the configured name, paralleling the configurednameandby_idon those rows.underlying: current backing block device (e.g./dev/sda), ornullwhen the disk is not a live pool member.devid: btrfs device ID as a number (e.g.1), ornullwhen the disk is not a live pool member.status: one ofpresent,missing,luks-header-unreadable,luks-uuid-mismatch,offline,unknown.btrfs_errors: btrfs I/O error counters (read,write,flush,corruption,generation, all integers) – the filesystem’s I/O accounting. Present when btrfs device stats are available; omitted entirely otherwise – including for present disks whenbtrfs device statsfails (which also emits abtrfs device stats failedadvisory). (This field was namederrorsbefore braid reported SMART; it was renamed so it reads as a sibling ofsmart, not the only error concept.)smart: the drive’s own SMART self-report – a verdict plus supporting evidence, a different layer frombtrfs_errors. Present for live pool members; omitted for disks with no backing path to probe. The object always carrieshealth("ok","warning","failing", or"unknown"). When SMART evidence is available it also carries aprotocoldiscriminator ("sata"or"nvme") and the per-protocol counters – for SATAreallocated_sectors,pending_sectors,offline_uncorrectable; for NVMemedia_errors,critical_warning,percentage_used,available_spare,available_spare_threshold– pluscelsiuswhen the drive reports a current temperature. A drive whose detail log is absent (or whose health isunknown) carrieshealthalone. This field is diagnostic evidence only – it does not feed the alert latch (see the note underalert_causes).
{
"name": "toshiba1",
"mapper": "braid-toshiba1",
"by_id": "/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234",
"luks_uuid": "aaaaaaaa-1111-2222-3333-444444444444",
"devid": 1,
"underlying": "/dev/sda",
"status": "present",
"btrfs_errors": { "read": 0, "write": 0, "flush": 0, "corruption": 0, "generation": 0 },
"smart": { "health": "ok", "protocol": "sata", "reallocated_sectors": 0, "pending_sectors": 0, "offline_uncorrectable": 0, "celsius": 26 }
}
A diagnostic unpooled disk (
missing,offline,unknown, orluks-header-unreadable) reports"luks_uuid": "","devid": null,"underlying": null, and nobtrfs_errorsorsmartkey.offlineis present but not assembled; the others reach the same blank/null row shape because no live member row is available. Correlate these rows byname.A config-disk probe fault (see the Advisories section above) on a member that is not live in the pool produces an
unknownrow of this shape – the advisory carries the cause. A member that is live keeps itspresentrow and is flagged by the advisory alone, so not every probe fault adds anunknownrow.
alert_active: booleanalert_causes: array of alert cause objects. Omitted entirely when no alert is active (the key is absent, not[]) – check the always-presentalert_activeboolean first, mirroring howadvisoriesis “omitted when none”. When present, each object is tagged by atypediscriminator:{ "type": "btrfs_device_errors", "devid": <number> }– btrfs I/O errors on that device.{ "type": "missing_device", "devid": <number> }– a device counted as missing.{ "type": "smartd_alert" }– a SMART health warning from smartd.{ "type": "computation_error", "detail": "<string>" }– braid could not compute alert state;detailexplains.
The per-disk
smartfield does not feed the alert latch. Thesmartd_alertcause is driven by the smartd daemon’s flag (/var/lib/braid/smartd-alert; see ADR 014 and ADR 030), not by the live per-disk SMART probestatusruns. So a report can carry a degradedsmartobject ("health": "warning") whilealert_activeisfalseand nosmartd_alertcause is present. This is intentional: the per-disksmartfield is diagnostic evidence; smartd remains the alert source.
advisories: array of human-readable advisory strings (omitted when none). See the Advisories section above for what currently produces them.missing_devids: array of every devid counted inmissing_count(btrfs-MISSING devices and null-underlying mappers whose backing device has disappeared). For destructiveremove-missing/replace --missing-idworkflows, see those commands’ notes – a null-underlying devid here will be rejected by those commands until btrfs promotes it to MISSING.profile: object withdata,metadata, andsystemarrays, present whenever btrfs reports block-group allocation and omitted when the pool is not mounted orbtrfs filesystem dffailed. Each array contains raw btrfs profile names such assingle,DUP,RAID0,RAID1,RAID1C3,RAID1C4,RAID5,RAID6,RAID10, or an unrecognized name verbatim. Arrays use canonical domain order, not alphabetical order, so mixed data is["single", "RAID1"], not["RAID1", "single"]. An empty array means btrfs reported no block groups of that type.capacity:total_bytes,used_bytes,free_bytesallocation: array of block-group entries, one per allocated type. Each entry hasbg_type(e.g.Data,Metadata,System),profile(raw btrfs profile name, same vocabulary asprofileabove),used_bytes, andallocated_bytes(both integers). Omitted when the pool is not mounted orbtrfs filesystem dffailed.
3-disk RAID1 profile:
"profile": {
"data": ["RAID1"],
"metadata": ["RAID1"],
"system": ["RAID1"]
}
Single-disk bootstrap profile:
"profile": {
"data": ["single"],
"metadata": ["DUP"],
"system": ["DUP"]
}
Mixed data after interrupted balance:
"profile": {
"data": ["single", "RAID1"],
"metadata": ["RAID1"],
"system": ["RAID1"]
}
The human-facing redundancy annotations from the text output, such as
(no redundancy), (same-disk copies; no disk redundancy), and
(not fully redundant), do not appear in JSON. The JSON payload carries only
the btrfs profile names braid observed; consumers apply their own policy.
balance: state object (idle,running,paused,unknown)last_scrub: state object (never,running,finished,aborted,interrupted,unknown). Forfinished,aborted, andinterrupted,started_atis an offset-free host-local ISO-8601 wall-clock timestamp (YYYY-MM-DDTHH:MM:SS) as reported by btrfs. It recordsScrub started, orScrub resumedafter a resumed scrub, and is not directly comparable to UTC fields such as pending-operationstarted_atvalues ending inZ. The same three states also carryerror_count(integer) – the count btrfs reported, the same number the text output renders as(N errors). Thescrub error details:journalctl command from the text output is not part of the JSON (mirroring the profile annotations above); a--jsonconsumer derives its own--sincevalue fromstarted_at.
A complete report for a healthy 3-disk RAID1 pool:
{
"mount_point": "/mnt/storage",
"status": "intact",
"total_devices": 3,
"present_count": 3,
"missing_count": 0,
"profile": {
"data": ["RAID1"],
"metadata": ["RAID1"],
"system": ["RAID1"]
},
"fsid": "f5f5f5f5-aaaa-bbbb-cccc-d0d0d0d0d0d0",
"capacity": {
"total_bytes": 18000000000000,
"used_bytes": 6000000000000,
"free_bytes": 12000000000000
},
"last_scrub": {
"state": "finished",
"started_at": "2026-05-01T03:00:00",
"error_count": 0
},
"balance": { "state": "idle" },
"allocation": [
{ "bg_type": "Data", "profile": "RAID1", "used_bytes": 6000000000000, "allocated_bytes": 6500000000000 },
{ "bg_type": "Metadata", "profile": "RAID1", "used_bytes": 8000000000, "allocated_bytes": 9000000000 },
{ "bg_type": "System", "profile": "RAID1", "used_bytes": 65536, "allocated_bytes": 33554432 }
],
"disks": [
{
"name": "toshiba1",
"mapper": "braid-toshiba1",
"by_id": "/dev/disk/by-id/ata-TOSHIBA_MN07ACA12T_1234",
"luks_uuid": "aaaaaaaa-1111-2222-3333-444444444444",
"devid": 1,
"underlying": "/dev/sda",
"status": "present",
"btrfs_errors": { "read": 0, "write": 0, "flush": 0, "corruption": 0, "generation": 0 },
"smart": { "health": "ok", "protocol": "sata", "reallocated_sectors": 0, "pending_sectors": 0, "offline_uncorrectable": 0, "celsius": 26 }
}
],
"alert_active": false
}
When the pool is not mounted, every mounted-only field above (
total_devices,present_count,missing_count,profile,fsid,capacity,last_scrub,balance,allocation) is omitted, leavingmount_point,status("not_mounted"),disks([]), andalert_active.advisoriesandalert_causesstill follow their skip-when-empty rule, so a latched alert or a pending-operation advisory can still appear on an offline pool.
Related commands
- braid unlock – bring the pool online
- braid replace – repair a degraded pool
- braid remove-missing – forget a dead device (operates only on btrfs-authoritative MISSING devids; see that command’s note on transient null-underlying state)
- braid doctor – diagnose pool/disk health and get recovery guidance
- braid idle – machine-friendly idle/busy check for autosuspend
braid doctor
Runs diagnostic checks on your braid configuration, pool health, RAID profile consistency, LUKS headers, auto-suspend wake path, and alerting hardware. Reports issues and suggests fixes.
When to use it
- After initial setup, to verify everything is wired correctly.
- Periodically, to catch drift (missing disks, mixed RAID profiles, broken alert speaker).
- When something seems wrong and you want a quick health summary.
Basic example
sudo braid doctor
Output:
[ok] config file /etc/braid/config.json exists and is valid JSON
[ok] config schema required fields present and valid
[ok] config perms /etc/braid/config.json permissions ok
[ok] declared disks all 3 declared disks present
[ok] missing devs no missing devices
[ok] enospc risk per-device unallocated space healthy
[ok] foreign uuids no foreign LUKS UUIDs in live pool
[ok] data profiles data profile: RAID1
[ok] meta profiles metadata profile: RAID1
[ok] system profiles system profile: RAID1
[ok] meta pressure metadata pressure within bounds
[ok] paused balance no paused balance
[ok] smart selftest disk1 passed ~2 days ago
[ok] smart selftest disk2 passed ~12 days ago
[ok] smart selftest disk3 passed ~30 days ago
[skip] alert beep skipped (pass --beep to play the audible alert test beep)
[skip] ups daemon skipped (braid.ups not enabled)
[skip] braid-online skipped (braid.ups not enabled)
[skip] wake-on-lan skipped (braid.autoSuspend not enabled)
The SMART self-test check emits one row per pool drive. If a drive has no recent completed self-test, the row includes a paste-ready smartctl command:
[warn] smart selftest disk2 no completed SMART self-test recorded -- run: smartctl -t short /dev/disk/by-id/...
The hint uses the stable by-id path: braid’s own diagnostic read prefers the member’s live backing device, but a smartctl -t short you run later should use by-id, which survives reboots and controller reordering.
To test the real alert sound:
sudo braid doctor --beep
Machine-readable output
sudo braid doctor --json
Prints a JSON object with status (one of ok, warn, fail, skip) and a checks array. Each check has name, status, and message. Per-drive checks also include subject.
--json mode never plays the alert beep test. The check still appears in the report as skip. --json and --beep conflict; run a separate sudo braid doctor --beep when you want to test the audible alert path.
What it checks
| Check | What it does |
|---|---|
config_file | Config exists and is valid JSON |
config_schema | Required fields present and deserializable |
config_permissions | Canonical /etc/braid/config.json is not world-writable and is owned by root; custom --config paths skip this check |
declared_disks | Every UUID-keyed pool.json member is present, is a block device, has a readable LUKS header, its live LUKS UUID matches the pool.json key, and, when the pool is mounted, is assembled into the live btrfs pool. Warn if a member is missing, is not a block device, has an unreadable LUKS header or probe failure, is present and identity-verified but not assembled into the live pool (offline), or the pool is mounted but its live topology cannot be probed to verify assembly; Fail if a member’s live LUKS UUID does not match its pool.json key. |
pool_missing_devices | No btrfs missing devices in the live pool |
enospc_risk | Warns when the pool is one disk-loss away from insufficient RAID1 chunk-pair space. Per-device threshold scales with pool size (min(1 GiB, 10% of total device bytes), matching the kernel’s effective data chunk size) |
foreign_luks_uuid | Fail when the live (mounted) pool contains a btrfs device whose LUKS UUID is not declared in pool.json (a foreign disk). The message pairs each foreign UUID and its mapper with a paste-ready btrfs device remove /dev/mapper/<mapper> <mount> then cryptsetup close <mapper> recipe – the observed mapper name and pool mount point are substituted in, and multiple foreign disks each get their own recipe. Skipped when the pool is not mounted. |
data_profile_mismatch | Data block groups all use the same RAID profile |
metadata_profile_mismatch | Metadata block groups all use the same RAID profile |
system_profile_mismatch | System block groups all use the same RAID profile |
metadata_enospc_pressure | Warns when metadata is near the next allocation threshold and fewer than two RAID1 devices have enough unallocated space for the next metadata chunk |
paused_balance | Warns if a btrfs balance is paused on the mounted pool (e.g. a prior balance interrupted by reboot, manual pause, or kernel pause) and suggests resuming with btrfs balance resume <mount>. |
smart_self_test | One result per pool drive: runs smartctl --json -A -l selftest <device> against each – <device> is the member’s live backing device (e.g. /dev/sda) when it is assembled into the mounted pool, otherwise its persisted by-id path (pool offline, probe failed, or that member not currently assembled – e.g. missing or hot-unplugged on a degraded mount) – then reports Fail on an active SMART self-test failure, Warn if no completed test in the last 90 powered-on days (or never), Ok otherwise, or Skip for NVMe/SCSI/unsupported drives. In --json, every per-drive result carries name: "smart_self_test" and a subject field naming the pool member; if pool membership is missing or empty, a single Skip result with name: "smart_self_test" is emitted; if pool membership is corrupt or unreadable, a single Warn result with the same name is emitted instead. In both fallbacks the subject field is omitted. Scripts should check whether subject is present before keying on it. |
beep_path | PC speaker alert beep is configured; with --beep, the alert beep command succeeds |
ups_daemon | With UPS enabled, upsc is available and can query the UPS daemon; missing or spawn-failed upsc is a failure, daemon unreachable/non-zero upsc is a warning |
braid_online_active | With UPS enabled and the pool mounted, braid-online.service is active so shutdown unmounts the pool. Standalone CLI installs (no NixOS module) skip this – there is no braid-online.service to verify. |
wake_on_lan | With auto-suspend enabled, ethtool <interface> reports magic-packet wake support and active Wake-on: g; disabled, unsupported, missing, or unparseable WoL state is a failure |
Flags
| Flag | Effect |
|---|---|
--json | Machine-readable JSON output; never plays the alert beep test |
--beep | Play the audible alert test beep; conflicts with --json |
Exit codes
- 0 – all checks passed (ok/warn/skip)
- 1 – at least one check failed
What happens under the hood
- Reads and validates
/etc/braid/config.json. - Loads UUID-keyed
pool.jsonand probes each declared disk viacryptsetup isLuksandcryptsetup luksUUID. - If the pool is mounted, queries
btrfs filesystem dfandbtrfs device usage --rawto check RAID profile consistency and metadata allocation headroom, probes for missing devices, reconciles each live pool member’s LUKS UUID againstpool.jsonto flag foreign devices, and runsbtrfs balance statusto detect paused balances. - For each declared disk, runs
smartctl --json -A -l selftest <device>– the member’s live backing device when it is assembled into the mounted pool, otherwise its persisted by-id path (including a member that is missing or unassembled on a degraded but mounted pool) – and parses the self-test log to detect active failures and report the age of the most recent passing entry. See ADR-024 for why present members are probed by live path rather than by-id. - If the braid monitor NixOS module is configured, reports the alert beep check as skipped by default.
- With
--beep, plays a short test beep through the canonical beep wrapper. - If UPS support is enabled, checks
upscand the mounted-poolbraid-online.serviceshutdown hook. - If auto-suspend is enabled, runs
ethtool <interface>to verify runtime Wake-on-LAN state. - Aggregates results and prints a summary.
Related commands
braid monitor
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Checks btrfs device error stats, missing devices, and SMART alerts. Designed to be run automatically by a systemd timer (every 5 minutes by default). Exits with a status code that drives the alert pipeline.
When to use it
You normally don’t run this by hand – the braid-monitor.timer systemd unit runs it automatically. Use it directly when debugging the alert system or testing your monitoring setup.
Basic example
sudo braid monitor
No output on success. Check the exit code:
sudo braid monitor; echo $?
Exit codes
| Code | Meaning |
|---|---|
| 0 | Healthy, pool is offline, or another braid command holds the pool lock (cycle skipped, re-evaluated on the next timer tick) |
| 1 | Alert active – one or more problems detected |
| 2 | Pre-monitor setup error (e.g. pool-lock I/O, config load failure) |
What triggers an alert (exit 1)
- btrfs device errors – any device in the pool has read, write, flush, corruption, or generation errors above the acknowledged baseline, including errors discovered during scrub.
- Missing device – btrfs reports a device as missing or a pool device has a null underlying path.
- SMART alert – smartd has written a SMART alert flag (via the braid smartd notifier).
- Computation error – a probe, parse, btrfs device stats call, mountinfo read, acked-stats baseline load, acked-stats save during self-heal, or alert-latch load/quarantine failed. Monitor fails closed: it latches a
ComputationErrorcause so the beeper fires andbraid statusshows the detail.
Flags
None. Monitor has no flags – it reads from the braid config and state files.
What happens under the hood
- Checks if the pool is mounted. If not, exits 0 (nothing to monitor).
- Runs
btrfs device statson the pool mount point. - Loads the acknowledged-stats baseline (
acked-stats.json) from a previousbraid ack. If the file is unreadable or unparseable, monitor fails closed – it latches aComputationErrorrather than firing every acknowledged cause against an empty baseline. - Self-heals stale ack state before computing alerts: prunes baseline entries for devices no longer in the pool, and clears the missing-acked flag for any device that was acknowledged missing but is now present again. If the baseline changed, the updated
acked-stats.jsonis written immediately; a write failure (e.g. EROFS, ENOSPC) is itself a fail-closedComputationError. - Computes alert causes against the reconciled baseline: btrfs device errors above the baseline, missing/null-underlying devices, and the smartd alert flag.
- Merges the causes into the alert latch (
alert-latch.json). The latch is sticky: once an alert fires, it stays active untilbraid ackclears it.
Alert pipeline
braid monitor --writes--> alert-latch.json --> braid status / braid tui (display)
(timer, every 5m) --exit 1--> braid-alert.service (beeper + alertCommand)
smartd --start--> braid-alert.service (beeper)
--writes--> smartd-alert --> next braid monitor cycle (latches SmartdAlert)
On exit 1, the braid-monitor.service wrapper starts braid-alert.service (the beeper, plus any alertCommand). After that, two things stay active until you braid ack, each held by a different mechanism:
- The latch and exit 1 – held by monitor. Each cycle it writes the live causes to
alert-latch.json, merging them into the existing latch, and re-exits 1 while any cause remains.braid statusand the TUI read the same file for display. - The beep – held by
braid-alert.serviceitself, not the read-back. Once started it stays active on its own (the backoff beep loop when beep is enabled, or aRemainAfterExitoneshot when it’s off), so the wrapper’s per-cyclesystemctl startis a no-op and a skipped cycle (offline or lock-contended exit 0) does not silence it. The service never readsalert-latch.jsonor thesmartd-alertflag.
smartd is a second, independent trigger: on a SMART fault it starts braid-alert.service directly and writes the smartd-alert flag, which the next monitor cycle latches as a SmartdAlert cause.
The beep stops only when braid ack clears the latch and runs systemctl stop braid-alert.service.
Related commands
- ack – acknowledge alerts and silence the beeper
- doctor – one-time diagnostic; pass
--beepto test the alert beep - status – shows active alerts in the status output
Related guides
braid ack
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Acknowledges active alerts and silences the PC speaker beeper. When there is an active alert source on a mounted pool, it also sets the current device error counts as the new baseline so the same condition won’t re-trigger.
When to use it
- The beeper is going off and you’ve investigated the cause.
braid statusorbraid tuishows active alerts you’ve already addressed.- After replacing a disk or running a scrub to clear errors.
Basic example
sudo braid ack
Output:
acknowledged 3 alerts
If there’s nothing to acknowledge:
no active alerts
What happens under the hood
- Reads the alert latch to determine how many alerts are active.
- If the pool is mounted:
- If a latch entry exists, the smartd alert flag is present, or the latch is corrupt, snapshots the current
btrfs device statserror counters and missing-device state. - Writes that snapshot as the new acknowledged baseline (
acked-stats.json). Future monitor runs compare against this baseline, so the same error counts won’t trigger again. - If none of those alert sources is present, exits 0 with
no active alertsand does not query btrfs or rewriteacked-stats.json.
- If a latch entry exists, the smartd alert flag is present, or the latch is corrupt, snapshots the current
- Stops
braid-alert.service(the beeper), best-effort. This runs first so the stop attempt is reached before any later file-removal I/O error can short-circuit the rest of cleanup. - Removes the smartd alert flag (
smartd-alert) if present. - Removes the alert latch file (
alert-latch.json). - Removes the corrupt-latch sidecar (
alert-latch.json.corrupt) if present.
On a cleanup I/O error, ack preserves retry state so the next braid ack resumes cleanup after the I/O fault is fixed.
When ack reaches cleanup and a later cleanup step fails, it leaves /var/lib/braid/alert-cleanup-pending. braid status surfaces ack cleanup pending -- re-run `braid ack` to resume as an alert cause until cleanup finishes. If that sentinel is the only remaining alert signal, the next braid ack re-enters cleanup directly (no btrfs probe, no baseline rewrite) and prints acknowledged current alerts on success – expected output because only leftover cleanup ran.
If the pool is offline but alerts exist (e.g., a latched smartd alert), ack still clears the latch and flag without snapshotting device stats. Offline means there is no mount at the configured mount point. If that path is occupied by a non-btrfs filesystem, braid ack returns a probe error naming the fstype and preserves alert-latch.json, smartd-alert, and acked-stats.json.
Flags
None.
Safety checks
- If the pool is not mounted and no alerts are latched, ack refuses with “pool is not mounted – nothing to acknowledge”
- If the pool is mounted but healthy with no latch entries, no smartd alert flag, and no corrupt latch, ack is a no-op and does not mutate
acked-stats.json - If the configured mount point is mounted as something other than btrfs, ack refuses with the fstype mismatch and does not clear or rewrite alert state
- If another braid operation holds the pool lock (
/run/braid-pool.lock), waits up to 10 seconds for it to finish: proceeds if the lock frees within that window, otherwise exits 1 with the pool-lock retry message.
Related commands
- monitor – the automated check that triggers alerts
- status – view active alerts
- tui – interactive dashboard shows alert state
Related guides
braid enroll
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Enrolls a binary keyfile into LUKS slot 1 on all pool disks. Used to set up USB auto-unlock: plug in a USB drive with the keyfile, and braid unlock --key-file can open the pool without typing a passphrase.
When to use it
- Setting up unattended unlock via USB keyfile.
- After adding a new disk to the pool (enroll the keyfile on it too).
Basic example
Generate a new keyfile on a USB drive and enroll it on all pool disks:
sudo braid enroll /mnt/usb --generate
/mnt/usb must already exist and be mounted. This creates /mnt/usb/braid.key (4096 bytes of random data) and adds it to LUKS slot 1 on every disk in the pool. You’ll be prompted for the pool passphrase.
Common variations
Enroll an existing keyfile (already at /mnt/usb/braid.key):
sudo braid enroll /mnt/usb
Non-interactive (passphrase from stdin):
echo -n 'my-passphrase' | sudo braid enroll /mnt/usb --generate --passphrase-stdin
Passphrase from a file:
sudo braid enroll /mnt/usb --generate --passphrase-file /root/passphrase.txt
Dry run (preview what would happen):
sudo braid enroll /mnt/usb --generate --dry-run
Flags
| Flag | Effect |
|---|---|
--generate | Create a new 4096-byte random keyfile before enrolling; the target directory must already be a mount point |
--passphrase-stdin | Read passphrase from stdin instead of TTY prompt |
--passphrase-file <path> | Read passphrase from a file instead of TTY prompt (conflicts with --passphrase-stdin) |
--dry-run | Show what would happen without making changes |
What happens under the hood
- Checks for a pending operation journal (refuses if one exists).
- With
--generate: Validates that the target directory exists, is a directory, is already a mount point, and does not already containbraid.key(if a prior--generaterun was interrupted, drop--generateand re-run to finish enrolling the existing keyfile; otherwise remove it manually first). - Without
--generate: Validates thatDIR/braid.keyalready exists and is a regular file. - Scans pool membership for present LUKS disks. Absent or non-LUKS disks are skipped with a message. If a present disk’s live LUKS UUID does not match the UUID recorded in
pool.json– the disk was swapped, cloned, or reformatted – enrollment aborts before any passphrase prompt or slot change; detach the foreign disk and reattach the original, or runbraid replaceif the swap was intentional. - Verifies the passphrase against every present pool disk before any keyfile probe.
- Without
--generate: Probes the keyfile against each disk. If it authenticates, reports “already enrolled” and skips that disk for the rest of enrollment. A rejected probe means the disk still needs enrollment; any other probe failure (e.g. device busy) aborts immediately rather than treating the disk as un-enrolled. - For each disk still needing enrollment, checks LUKS slot 1: proceeds if free; refuses with an error if occupied by an unknown key (you must remove it first with
cryptsetup luksKillSlot). - With
--generate: Only after all preflight checks pass, generates the random keyfile. - Enrolls the keyfile into LUKS slot 1 on each disk.
- Creates a LUKS header backup for each modified disk.
See Pending LUKS header backups – copy each .luksheader off-system and delete the local copy.
Safety checks
- Refuses if a pending operation journal (
pending-op.json) exists – runbraid recoverto reconcile. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses if a present disk’s live LUKS UUID no longer matches its
pool.jsonrecord – the disk was swapped, cloned, or reformatted; detach the foreign disk and reattach the original, or runbraid replaceif the swap was intentional. This UUID check is repeated at the mutation boundary, after the passphrase is read and before any keyfile is enrolled, so a disk swapped during the passphrase prompt is still caught before slot 1 is touched. - With
--generate, refuses unless the target directory is already a mount point. - Passphrase is verified before any mutations.
- Slot 1 conflicts are detected before the keyfile is generated, so you never end up with an orphan keyfile.
- With
--generate, refuses ifbraid.keyalready exists at the target path; if a prior--generaterun was interrupted, drop--generateand re-run to finish enrolling the existing keyfile. - Without
--generate, refuses if the keyfile doesn’t exist. - Idempotent: if the keyfile is already enrolled on a disk, that disk is skipped.
Related commands
- unlock – use
--key-fileto unlock with the enrolled keyfile
Related guides
braid discover
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Scans /dev/disk/by-id/ for LUKS devices with braid-* labels, reads their LUKS UUIDs, and reconstructs UUID-keyed pool membership. This is a repair tool for recovering a lost or corrupt pool.json.
When to use it
- Your
pool.jsonwas deleted or corrupted. - You’re migrating disks to a new machine and need to rebuild pool state.
The normal path for adding disks is braid add. Use discover only when pool.json is missing or corrupt – it refuses to run while a valid pool.json exists. To see the disks already in a healthy pool, use braid status.
Basic example
When pool.json is missing, preview the membership discover would rebuild before saving it (no changes):
sudo braid discover
Output:
ironwolf = /dev/disk/by-id/ata-ST12000VN0008_XXXXXXXX
toshiba = /dev/disk/by-id/ata-TOSHIBA_MN08ACA16T_XXXXXXXX
pass --write to save to /var/lib/braid/pool.json
Bare discover prints this preview only when pool.json is absent. Over a valid pool.json it exits with an error – use braid status to view current membership. Over a corrupt pool.json it also refuses, pointing you to discover --write (see Common variations).
The membership rows are written to stdout; the pass --write to save hint, the
--write “pool membership written” confirmation, scan warnings, and errors go
to stderr. So braid discover > members (or braid discover | grep <disk>)
captures only the rows.
Common variations
Write the discovered membership to pool.json:
sudo braid discover --write
If you can name the expected member count ahead of time, pass it as a fail-closed guard against a detached disk or stray braid-labeled disk:
sudo braid discover --write --expect-count 3
Flags
| Flag | Effect |
|---|---|
--write | Persist the discovered membership to pool.json |
--expect-count <N> | With --write, refuse to write if the discovered member count is not exactly N |
What happens under the hood
- With
--write, refuses if a pending operation journal (pending-op.json) exists. Barediscoveris read-only and skips this gate. - Refuses over an existing UUID-keyed
pool.json(bare and--write). A corrupt or off-schemapool.jsonis the documented rebuild path: barediscoverprints the rebuild remediation, anddiscover --writewrites a forensicpool.json.corrupt-<RFC3339-UTC>snapshot adjacent to the new file, then rebuilds. If the snapshot cannot be written (full disk, read-only state directory),discover --writerefuses rather than destroy the corrupt original. - Reads all entries in
/dev/disk/by-id/in sorted filename order, skipping partition entries (e.g.,ata-TOSHIBA-part1). Sorting up front makes label-collision reporting (step 10) independent ofread_dirorder. - Resolves each by-id symlink to its canonical kernel device. Skips with a
cannot canonicalizewarning when the symlink is dangling (e.g., udev didn’t clean up after a disk removal). - For each entry, runs
cryptsetup isLuksto check if it’s a LUKS device. - Runs
cryptsetup luksDumpto read the LUKS label, version, and UUID. - Skips LUKS1 devices (braid requires LUKS2).
- Matches labels of the form
braid-<name>and extracts the disk name. - Uses the canonical kernel device resolved above to detect multiple
/dev/disk/by-id/symlinks for the same physical disk (i.e.wwn-andata-aliases), then picks the most stable one (preference order: wwn > nvme > scsi > ata > usb > other, with lexicographic tie-breaking). - If two symlinks that share the same
braid-<name>label resolve to different kernel devices, refuses the entire scan with an error. Two physically distinct disks share a label – typically after addclone or a manual mislabel – and braid cannot safely choose one. Relabel or detach one disk before retrying. - If two distinct devices share one LUKS UUID, refuses the entire scan. This usually means a cloned disk is attached.
- With
--write, saves the discovered UUID-keyed membership topool.json.
Safety checks
- Refuses any operation on an existing UUID-keyed
pool.json. Corrupt or off-schema files are allowed for--writerebuild only; the original is copied topool.json.corrupt-<RFC3339-UTC>before overwrite, and--writerefuses if that snapshot cannot be written (full disk, read-only state directory). Run with all intended pool members attached; seedocs/internals/luks-unlock.md. - With
--write, refuses if a pending operation journal (pending-op.json) exists – runbraid recoverto reconcile. - With
--write, refuses if another braid operation is in progress (pool lock/run/braid-pool.lockis held) – retry once it finishes. - With
--expect-count, refuses to write if the discovered member count is not exactly the requested count. - Without
--write, makes no changes at all – read-only scan that takes no pool lock and does not consult the pending-op journal. - Dangling
/dev/disk/by-id/symlinks are skipped with a warning – a diagnostic operators need when udev leaves a stale alias behind after a disk swap. - LUKS1 devices are skipped with a warning.
- If no braid-labeled LUKS2 devices are found,
discoverexits 1 withno braid-labeled LUKS2 devices found -- ...(both bare and--write) – check the intended members are attached, readable, and labeledbraid-<name>as LUKS2. An array that is entirely LUKS1, detached, or unreadable lands here, with any present-but-skipped disk warned about above. - Refuses the scan if two distinct devices share the same
braid-<name>LUKS label – relabel or detach one disk before retrying. - Refuses the scan if two distinct devices share the same LUKS UUID – detach the cloned or unintended disk before retrying.
Related commands
- recover – resume an interrupted operation (has its own membership rebuild from live pool state)
- status – view current pool membership
braid recover
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Resumes from an interrupted operation (add, remove, replace) by opening LUKS devices, mounting the pool, rebuilding pool.json from live pool state when appropriate, running owed maintenance when the btrfs balance state is idle, and clearing the pending-operation journal only after the safe recovery path completes.
When to use it
- After a crash, power failure, or interrupted braid command.
- When
braid statusor other commands show “pending operation – runbraid recover”. - Only available when
pending-op.jsonexists.
Basic example
sudo braid recover
You’ll be prompted for the pool passphrase. Output shows the recovery process:
Recovering from interrupted "add" operation (started 2026-03-15T14:30:00Z)...
pool.json written from completed add membership.
pool.json written from committed add membership.
pending-op.json cleared. Recovery complete.
Before the pool.json lines, a real run prints either per-disk LUKS-open and mount rows (if the pool was offline) or a single pool already mounted at ... row (if it was already mounted). On the idle/no-paused owed RAID1 path, after the committed line it prints a RAID1 soft-balance replay row pair before the final pending-op.json cleared line. If the balance check is paused, running, or unknown, recover fails before the replay row and does not clear the journal.
Important
If recover refuses owed RAID1 replay because btrfs balance state is paused, running, or unknown, it left
pending-op.jsonin place. Inspect btrfs manually before clearing recovery state.
Common variations
Non-interactive (passphrase from stdin):
echo -n 'my-passphrase' | sudo braid recover --passphrase-stdin
Passphrase from a file:
sudo braid recover --passphrase-file /root/passphrase.txt
Recover with a missing disk (degraded mode):
sudo braid recover --allow-degraded
Preview what would happen:
sudo braid recover --dry-run
Flags
| Flag | Effect |
|---|---|
--passphrase-stdin | Read passphrase from stdin instead of TTY prompt |
--passphrase-file <path> | Read passphrase from a file instead of TTY prompt (conflicts with --passphrase-stdin) |
--allow-degraded | Allow mounting with missing devices (redundancy is reduced until you replace the missing device) |
--dry-run | Show what would be done without making changes |
--progress auto|always|never | Control progress display (default: auto) |
What happens under the hood
-
Loads
pending-op.json(refuses if absent – nothing to recover). -
Chooses the mount membership from the journal phase. Existing-pool add and remove-missing
PoolMutationphases mount from the pre-operation membership. Add, remove-missing, and replace post-maintenance phases mount from the committed target membership. ReplacePoolMutation, bootstrap addPoolMutation(the first disk, whose pre-operation membership is empty), andRemovemount from the admission membership (pre-operation snapshot plus target-only members) – for replace this matters because the kernel may still be completingdev_replace. -
Opens LUKS devices and mounts the pool (or reuses the existing mount if already mounted). Exception: a
Replace::PoolMutationjournal on an externally-mounted pool is refused (see Safety checks); replace post-maintenance recovery on an already-mounted pool is allowed. -
For
Replace::PoolMutationonly, if a kernel-resumed btrfs replace is in progress, waits for it to finish. -
For
Replace::PoolMutationonly, if the pool was just mounted by this recover run, performs a full relock-and-remount cycle (umount,btrfs device scan --forget, close LUKS, reopen, remount) to ensure the kernel’s in-memory device topology matches the on-disk state. -
Probes the live pool to discover actual membership.
-
For interrupted existing-pool add
PoolMutation, first runs a non-destructive Add target reconciliation pass: any journaled add target whose underlying disk is physically present and LUKS-openable is opened, scanned, and followed by a live-pool re-probe. Targets that turn out to be live pool members are adopted into the recoveredpool.jsonwithoutwipefsorbtrfs device add. -
For add
PoolMutation, replays only journaled targets that are not already live.RecoverableBraidLabeledtargets are replayed viawipefs --all --types btrfsplusbtrfs device add -fafter LUKS UUID and visible-FSID checks.FreshLukstargets that are physically present are replayed from the journaled format options, skipping format if the disk already has the expected LUKS label; if the journal carriedenroll_key_file, the keyfile is re-enrolled, then the LUKS header is backed up, the mapper is opened, andbtrfs device addruns without-f.FreshLukstargets that are physically absent or carry an unexpected LUKS label make recover fail and leavepending-op.jsonin place so the disk can be reattached or replaced and recovery rerun.See Pending LUKS header backups – copy each
.luksheaderoff-system and delete the local copy. -
For add
PostAddBalanceRaid1, does not format, enroll, back up headers as target prep, wipe, or add disks. It only validates the committed live pool and runs the owed RAID1 balance when btrfs balance state is idle; a paused, running, or unknown balance state fails closed with the journal preserved. -
For replace and remove-missing
PoolMutation, detects whether the primary btrfs membership mutation committed. If it did not commit, recover restores/keeps the pre-operationpool.json, clears the journal, and tells you to rerun the original command. It does not rerunbtrfs replace startorbtrfs device remove. -
For replace and remove-missing post-maintenance phases, validates committed live membership, repairs
pool.jsonif needed, and finishes only owed maintenance such as resize or, when btrfs balance state is idle, soft RAID1 balance; it does not rerun the primary btrfs membership mutation. A paused, running, or unknown balance state before owed RAID1 replay fails closed withpending-op.jsonpreserved. -
Resolves
/dev/disk/by-id/paths from live LUKS UUIDs, using btrfs devid only for missing or null-underlying bindings (not from the journal’s by-id path, which may be stale). -
Writes or repairs
pool.jsononly after the journal phase allows it and live membership is complete. -
Clears
pending-op.jsononly after membership is complete and any owed balance work is done.
Safety checks
- Refuses if no
pending-op.jsonexists. - Refuses if another braid operation is in progress (pool lock
/run/braid-pool.lockis held) – retry once it finishes. - Refuses to adopt live pool members outside the recovery admission membership for the current journal phase (guards against devices added outside braid). Most phases admit the pre-operation snapshot plus target-only members;
Replace::PostReplaceMaintenanceadmits only the committed target membership because btrfs preserves the old device’s devid on the replacement after commit. - Hard-fails if a live pool device has no
/dev/disk/by-id/symlink (recovery can’t guess a stable identifier). - Detects interrupted bootstrap add (first disk, no filesystem yet) and gives specific wipe-and-retry instructions instead of a confusing mount error.
- Refuses to overwrite
pool.jsonor clearpending-op.jsonif the post-mount probe at the configured mount point sees the pool unmounted or with zero btrfs devices. The mount may have been removed externally between recover’s mount step and membership probe;pool.jsonandpending-op.jsonare both preserved – investigate the mount, then re-runbraid recover. - For existing-pool add recovery, refuses to clear the journal while any journaled add target is missing from the live pool.
- Returned-disk replay may need a pool passphrase even when the pool is already mounted, because the mapper for the journaled target may still be closed.
- Without
--allow-degraded, refuses to mount if devices are missing (exit code 2 for degraded-refused, distinguishing it from other errors). - Refuses to recover
Replace::PoolMutationwhen the pool is already mounted (admin-mounted, circumventing braid’s pending-op preflight onunlock). The kernel may have resumed an interrupteddev_replaceon that mount session, leaving stale in-memory device state that recover cannot scrub without unmounting – which it will not do on a mount it does not own. Remediation:sudo braid lock; sudo braid recover.
Related commands
- status – shows pending operation state and prompts you to recover
- discover – rebuild UUID-keyed pool.json from LUKS labels and UUIDs (when there’s no journal)
- unlock – normal unlock (when no journal exists)
Related guides
braid tui
Interactive terminal dashboard showing pool state, disk health, allocation, scrub status, and active alerts.
When to use it
- Quick visual overview of your NAS health.
- Checking disk-level detail (LUKS cipher, SMART health, error counts, transport).
- Monitoring during or after a scrub.
Basic example
sudo braid tui
Demo mode
Try the TUI without a real pool (no config or btrfs required, no root required):
braid tui --demo
Demo mode shows three fake disks with sample data, useful for exploring the interface.
Flags
| Flag | Effect |
|---|---|
--demo | Run with fake data (no config, btrfs, or root required) |
Keybindings
| Key | Action |
|---|---|
q | Quit |
r | Reload pool data now |
Tab | Next tab |
Shift-Tab | Previous tab |
j / k | Select next/previous disk (Data/Scrub) or move within the focused Browse region |
h / l | Move left/right across Browse regions |
Ctrl-D / Ctrl-U | Page Browse content down/up (one screen at a time) |
Enter | Open disk detail popup (Data) or drill into Browse content |
Esc | Close disk detail popup or return from Browse drill-in |
? | Toggle help overlay |
Shift-R | Reset session temperature hi/lo watermarks |
What it shows
Main view – pool status, mount point, the Profile summary
(data <X> | meta <Y> | system <Z>, where each value is the profile name
verbatim for a single recognized profile such as RAID1, DUP, or single;
partial when that block-group type spans more than one profile; the raw
profile name verbatim for an unrecognized profile like RAID5; or unknown
only when no block groups of that type were reported), capacity bar, balance
state, and active alerts and advisories.
Refreshing – while the pool is mounted, pool, disk, scrub, and alert data
refresh automatically about every 10 seconds and immediately when you press
r. While the pool is not mounted, that data stays manual-only via r. When
enabled, Fans and UPS telemetry also refresh automatically every 5 seconds and
immediately on r. The footer’s Reload: r spinner and idle (Xms) duration
reflect pool refreshes, including automatic pool refreshes; automatic Fans/UPS
polls do not update it. The view redraws periodically while idle so relative
ago times stay current.
Disk table – one row per disk: number, name, bus (sata/usb/nvme), SMART health, temperature, btrfs device-error count, and allocated (shown as percent used and allocated/size).
Fans (when fan control is enabled) – Data-tab row with a daemon:
header annotation for hddfancontrol-braid.service: active is green,
activating and inactive are yellow, failed is red, and unknown is
gray. The annotation is not a column; the columns are PWM (raw/255 plus
percent), RPM, Driving (the hottest drive and its temperature), and Curve.
See the fan control guide.
UPS (when UPS support is enabled) – Data-tab row with the same
daemon: header annotation for the NUT daemon. The columns are Status
(color-coded flags), Battery, Runtime, and Load. See the
UPS guide for Status severity.
Disk detail popup (press Enter on a disk) – disk name, LUKS lock status, cipher, key size, keyslot count, an allocations table (type/profile/size plus unallocated), the btrfs device-error breakdown (read/write/flush/corruption/generation), and a SMART section with the health verdict plus its supporting evidence rows (per-protocol: SATA reallocated/pending/uncorrectable, or NVMe critical-warning/media-errors/available-spare/percentage-used). A row for an out-of-spec attribute is colored red. Temperature is not repeated here – it has its own column in the disk table.
Tabs – three tabs, switched with Tab / Shift-Tab:
- Data (default) – pool allocation breakdown, disk table, capacity bar, plus Fans and UPS rows when enabled.
- Scrub – per-device scrub state, progress, and timing.
- Browse – raw CLI output inspector across five tool families: Btrfs, NUT (UPS), Systemd, SMART (smartctl), and lsblk. Btrfs views include filesystem usage/show/df/commit-stats, device usage/stats, subvolumes with drill-in plus raw full/snapshot/deleted/default views, scrub status/limits, balance status, quota status/qgroups, and inspect-internal chunks. UPS views include status, raw variables, supported instant commands, connected clients, settable variables, and UPS discovery. Systemd views include unit status, show, braid units, failed units, timers, and mounts. SMART views include device scan, health, info, attributes, and self-test/error logs. lsblk views include tree, filesystems, disks, all-columns, and SCSI.
NUT > UPSescan help find the correctups.namebefore UPS support is enabled.
Related commands
- status – non-interactive pool health output
- ups status – non-interactive UPS state output
braid ups status
Note
Experimental 🧪
This command is experimental: the idea or implementation is still uncertain and may be removed, replaced, or overhauled before braid v1.0.
Query the UPS (NUT) daemon for the currently configured UPS and render a curated human summary or the serialized parsed model as JSON.
Requires UPS support enabled (braid.enable = true and
braid.ups.enable = true). With UPS disabled the command prints an enable
hint and exits 0 (not an error).
Basic example
sudo braid ups status
Output:
UPS: ups
Status: OL
Battery: 100%
Runtime: 30m 0s
Load: 17% (56 W estimated)
Input: 120.0 V (transfer 88-142 V)
Device: APC Back-UPS ES 550G
Battery manufactured: 2023/04/12
Last test: Done and passed
JSON output
sudo braid ups status --json | jq .
Emits the serialized UpscOutput model. A success body (no top-level
error) is trustworthy telemetry: braid faithfully serialized
whatever upsc reported. It is not a claim that the UPS is online –
on-battery (OB), low-battery (OB LB), and all-unrecognized status sets
are all success bodies with no error and no warning.
To judge UPS state, read status_flags: utility power is proven only by
the presence of OL with no blocking flag (OB, LB, TESTFAIL,
COMMBAD, FSD) – the same affirmative-OL criterion braid’s own
mutation preflight uses (see
the UPS guide).
Shape:
{
"status_flags": ["OL"],
"battery": {
"charge_pct": 100,
"runtime_secs": 1800,
"voltage": "27.0",
"type": "PbAc",
"mfr_date": "2023/04/12",
"runtime_low_secs": 120
},
"load_pct": 17,
"realpower_nominal_watts": 330,
"input": {
"voltage": "120.0",
"transfer_low": "88",
"transfer_high": "142",
"sensitivity": "medium"
},
"test_result": "Done and passed",
"device": {
"model": "Back-UPS ES 550G",
"mfr": "APC",
"serial": "3B1234X56789",
"type": "ups"
},
"extra": { "driver.name": "usbhid-ups", "battery.charge.low": "10" }
}
In a success body (the shape above – a reachable UPS, no top-level error),
every typed field is always present: a scalar the driver did not report
serializes as null rather than being omitted, and the battery, input, and
device objects are always present even when all of their fields are null.
Test typed fields for a null value, not for a missing key – a has(...)
check on any typed key always returns true. status_flags and extra are
always present but never null ([] and {} when empty). The only field
omitted when absent is the top-level warning (see the table below). Error
bodies are the exception – they carry error/detail and none of the typed
keys, so a script must confirm there is no top-level error before relying on
the rule above.
status_flags lists flags in first-seen ups.status token order (whitespace normalized, duplicate tokens dropped); braid does not sort them, so the order is deterministic for a given UPS state.
extra is a string-keyed map of every upsc line that did not land in a typed field above. Its contents vary with the NUT driver and version (typically driver.* debug keys plus other untyped fields like battery.charge.low or input.voltage.nominal), and values are kept verbatim as strings.
Distinct sentinels cover the common non-OK cases:
| Condition | JSON | Exit code |
|---|---|---|
UPS reachable with populated ups.status | serialized UpscOutput | 0 |
UPS reachable but ups.status empty | serialized UpscOutput plus "warning": "ups_status_empty" | 0 |
| UPS query failed | {"error": "query_failed", "detail": "exit <code>: <stderr>"} | 1 |
| UPS invocation failed (upsc could not run – missing on PATH, killed by signal, or other runner-level failure) | {"error": "invocation_failed", "detail": "command failed: upsc ups: <reason>"} | 1 |
| UPS not enabled | {"error": "ups_not_enabled"} | 0 |
If error or warning is present, do not treat the typed body as
healthy UPS state. For these cases, --json writes only to stdout –
stderr stays silent so the JSON sentinel can be piped into a single
sink (jq, tee, CI logs) without a redundant human error line.
Other failure modes, such as malformed config, still print a human
error to stderr.
The converse does not hold: the absence of error and warning does
not by itself mean the UPS is online – inspect status_flags as above.
ups_status_empty fires only when ups.status is empty or missing (no
flags to read), so it is not a general health signal.
Flags
| Flag | Effect |
|---|---|
--json | Emit parsed upsc model as JSON; stable shape for scripts |
Related
- UPS guide – shutdown path, preflight refusal, v1 limitations
- tui – the TUI’s Data tab shows the same live UPS state
- doctor – UPS-adjacent configuration checks
Principles
Canonical invariants for braid. Each principle is authoritative — if code or config contradicts a principle, the code is wrong.
1. Resilient by default
Data drives never block boot. The pool is unlocked and mounted by explicit CLI invocations (braid unlock, the braid-auto-unlock.service unit, or braid recover during recovery), not by systemd mount units. No LUKS or btrfs units are generated at build time. Degraded mounts require explicit --allow-degraded — braid refuses to silently run with zero redundancy. Why →
2. CLI-owned membership
Disk membership is runtime state owned by the CLI, stored in /var/lib/braid/pool.json. Adding or removing a drive is braid add name=/dev/disk/by-id/... — no nixos-rebuild required. The NixOS module provides the mount point, services, and toolchain; the CLI owns which disks are in the pool. unlock requires pool.json to exist and be valid — it never creates or repairs it. Recovery is explicit via braid discover --write. Why →
pool.json is a best-effort operational snapshot — it tells braid which drives to attempt unlocking, not what the pool actually looks like. Any state that can be read from live btrfs (devids, device counts, FSID) must come from btrfs, not pool.json. Commands like status must never surface pool.json-sourced devids; for display authority, devids are authoritative only when read from a mounted filesystem via btrfs device usage or equivalent. Persisted DiskMember.devid carries prior-binding authority only: when live btrfs reports a device by devid alone (the null_underlying mapper case and the btrfs missing_devids case), the persisted devid is the authorized fallback binding for re-attaching that live device to its membership entry. This is not a display-side use of pool.json devid; status output continues to draw devids from live btrfs. Why →
3. Safe-by-construction operations
- Each intent command (
add,remove,remove-missing,replace) does exactly one thing with risk-appropriate confirmation.replacealways usesbtrfs replace start— for live disks it replaces in-place, for missing disks it rebuilds from RAID redundancy using the missing device’s devid.remove-missingcleans up a stale missing-device entry; it never rebuilds data onto a new device (that isreplace). When clearing the last missing device with ≥2 devices remaining, bothremove-missingandreplace(missing path) run a follow-up soft balance to restore RAID1 profiles for chunks written during degraded operation. - Post-commit persist with journal: mutating commands write a pending-operation journal (
pending-op.json) with pre/target membership snapshots before the first irreversible disk operation.pool.jsonis written once the btrfs membership change has committed, so it reflects committed live membership, not necessarily completion of follow-up maintenance such as RAID1 rebalance or resize. Phased journals advance to post-maintenance after the committedpool.jsonwrite; those post phases must never rerun the primary btrfs membership mutation. The journal is cleared only after the entire lifecycle succeeds, including required post-mutation maintenance like soft balance. While the journal exists,braid recoverreplays owed maintenance when btrfs balance state is idle and fails closed with the journal preserved when owed RAID1 replay finds a crash-paused, running, or unknown balance state. If braid crashes or fails mid-operation, the journal triggers recovery mode: membership/mount/key-enrollment commands (add,remove,remove-missing,replace,unlock,enroll,discover --write) hard-fail; read-only diagnostic and cleanup surfaces (status,doctor,lock, barediscover) stay available.braid recoverrebuilds membership from the live mounted pool (not LUKS label scanning) and is the only command that clears the journal. - Environment-side resource acquisition (file locks, sleep inhibitors, dbus/logind handshakes, external service availability) must happen before
journal::write_journal. The journal write commits the user to recovery mode on any subsequent failure, so a pure environment failure (logind unreachable, flock contention) must not leave a strandedpending-op.jsonfor what was conceptually a “command never started” failure. The journal write is the line of no return; reorder code so any RAII guards or environment probes that can fail are bound above it. The per-command pre-journal excluded scope (which also covers reversible validation and identity checks) is enumerated in ADR 019. - Disk names are immutable once recorded in pool membership; name rename/reassignment is rejected by mutating commands and must use explicit
replaceorremove+addworkflows. mkfs.btrfsis gated on bootstrap only – bootstrap accepts only disks classified as fresh non-LUKS during add planning, and the LUKS open helpers verify that any pre-existingbraid-<name>mapper is backed by the requested by-id disk before pool creation proceeds.mkfs.btrfsis invoked without-fso its own libblkid signature check is the final fail-closed guard.- An existing LUKS device or pool member is never reformatted — a multi-layer identity check (LUKS label match, LUKS UUID cross-check against pool.json, pool-mounted requirement, btrfs FSID comparison) prevents accidental data loss, with the btrfs superblock guard as defense-in-depth.
- Failed
unlockand recovery mount paths close only LUKS mappers braid newly opened during that invocation. They never close pre-existing operator-owned mappers, including mappers that become already open between planning and execution. - Mounts always include
skip_balance— btrfs silently resumes interrupted balances on mount by default, which can re-trigger ENOSPC or surprise the user with heavy I/O. braid manages balance lifecycle explicitly;unlockwarns if a paused balance is detected. - The bare pool mountpoint is sealed immutable (
chattr +i) while the pool is offline, so a process writing it before mount fails withEPERMinstead of silently landing on the root filesystem and being shadowed when the pool mounts. The seal is always-on (no knob), lives only in the boot/activation unit (braid-seal-mountpoint), and persists across lock/unlock. Why → - Dry-run previews for migrated mutating commands are rendered from the same typed work plans that execution consumes;
Stepis output-only. Why -> - Why →
4. Single passphrase
All drives share one LUKS passphrase. braid unlock and braid add depend on this — one passphrase unlocks all drives. Before any irreversible operation, every reachable existing LUKS device that will remain in or enter post-operation pool membership has its slot 0 verified. Fresh-format disks are excluded because they have no existing slot 0. The live-replace source is excluded when other retained members exist, so a divergent slot 0 on the disk being replaced does not block its own replacement. The same all-relevant-disk rule applies to keyfile credentials used by mount, unlock, and recover. Why →
Binary keyfile support is available via braid enroll (slot 1) and braid.autoUnlock (NixOS module). The passphrase (slot 0) is the default interactive-unlock mechanism; the slot-1 keyfile drives braid.autoUnlock for unattended boots and can also be passed directly to braid unlock --key-file.
5. Stable identifiers
All persistent storage config uses /dev/disk/by-id/ paths. Never /dev/sdX. Mapper names are braid-<disk-name> (e.g., braid-toshiba) — deterministic, human-friendly, debuggable in lsblk, systemd logs, and error messages. LuksUuid is the primary persistent identity for code; the disk name and the LUKS label are presentation; by_id is for hardware addressing. When the live LUKS UUID is unobservable for a device the kernel/btrfs still reports (null_underlying mapper, btrfs missing_devids), btrfs devid is the only authorized live-fallback binding key. No code path may decide membership, target a device, or correlate live pool state by parsing a name out of a mapper path or LUKS label, except in two narrow cases: discover bootstrapping a UUID-keyed membership from cold disks, and returning-disk adoption safety in add (the PresentLuks path may gate adoption on label match, but identity correlation still uses LuksUuid/devid/FSID). Why →
6. btrfs RAID1
Auto-healing checksums, dynamic drive pooling, in-kernel (no out-of-tree modules). 50% space overhead is accepted. btrfs RAID5/6 is not production-ready. Why →
7. Sane defaults
If a knowledgeable admin would always enable it, braid enables it by default. Use lib.mkDefault for simple pass-through defaults on stable NixOS options. Wrap in a braid.* option when the feature is inside braid’s product boundary and benefits from lifecycle control, discoverability, or a unified config surface – even if the mapping is 1:1. Examples: braid.autoScrub (periodic scrub with lifecycle binding to pool online state), poolAccessGroup for mount root access (root:storage 2770). Why ->
8. Test every design decision
NixOS VM tests validate behavior, not just command success. TDD: write failing tests first, confirm they fail for expected reasons, then implement.
9. NixOS-native
Braid only targets NixOS. No portability abstractions, no generic Linux fallbacks. Follow NixOS module conventions — same option types, patterns, and idioms as nixpkgs. When in doubt, nixpkgs is the tiebreaker. Why →
10. Pinned toolchain
Parser-critical tools (btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, ethtool) are pinned to a specific NixOS stable release via the flake input. Wrappers execute with an explicit PATH built from module-controlled packages (braid.packages.*). Parsers assume the output format of the pinned version – upgrading those tools requires updating fixtures and parser tests. These pinned defaults are a compatibility baseline, not a lock; users may override braid.packages.* to pick up newer system versions when needed. Generic helpers (coreutils, systemd) come from the consumer’s package set and are not part of braid’s parser contract, except that Browse parses systemctl list-units --output=json as a tolerant UI-only picker with raw-output fallback. Why ->
11. HDD defaults
Mount options, LUKS flags, and scrub scheduling are chosen for HDD NAS deployments. Why →
12. One pool operation at a time
Rust dispatch acquires /run/braid-pool.lock before loading config, loading pool.json, probing pool state, or prompting for command input. The authoritative command-to-lock-discipline mapping lives in lock_policy in cli/src/main.rs; its wildcard-free exhaustive match makes every Commands variant choose a discipline at compile time.
Lock disciplines are policy categories, not prose-maintained command lists. Interactive mutators acquire non-blocking and fail fast with braid: another braid operation is already in progress so the user can retry once the active operation completes. Short-contention maintenance paths may wait for a bounded timeout, such as the 10-second alert acknowledgement window. Timer-driven monitoring may exit 0 silently on contention because a missed cycle is harmless and exit 1 would falsely start alert notification. Read-only paths and dry-run modes do not acquire the lock; bare discover is read-only, while its write mode participates because the scan -> pool.json write window must be serialized against pool-state mutators. Read-only diagnostics status and doctor never acquire the lock so operators retain a working diagnostic surface during contention; tests/module/pool-lock-readonly-bypass.py pins this invariant.
Mutual exclusion is enforced at the critical section itself, not via systemd unit topology. Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on the current locked state rather than stale pre-lock observations.
13. Announce long-running work
Every interactive command emits a [wait] row before any subprocess
that can stall the terminal long enough for the user to wonder
whether the CLI has hung. The bound categories:
- cryptsetup Argon2 operations (
luksFormat,luksOpen,luksAddKey,--test-passphrase); cryptsetup close(single attempt or busy-retry loop);- btrfs
balance,replace, anddevice remove(potentially hours); mountandumount(kernel can drain in-flight I/O / replace workers / inhibitors).
A [wait] row is closed by one of:
- the same command’s paired success row (
[ok] {same subject}: ...) on the success path, - a same-subject
[fail]row on a known failure path (e.g.lock.rs’s umount failure), - a same-subject
[warn]row on a non-fatal best-effort failure (e.g.mapper_close::close_mapper_best_effort’s LUKS close, orwait_for_kernel_replace_to_finish’s status-poll error — the command continues despite the failure, and the warn row tells the user the wait window is closed without success), - a same-subject
[skip]row on a successful negative or no-op probe (e.g.braid enroll’s pre-mutation keyfile probe finding the keyfile not yet enrolled — the work the wait announced completed, the answer is “no work yet”), - or the command’s normal error output (
MountError/LuksError/PoolErrorpropagation) on uncaught error paths.
A [wait] followed by none of these closers (i.e., success, fail,
warn, skip, or non-zero exit) is a documentation bug.
Fast bookkeeping that completes well under a second
(mkfs.btrfs on a fresh disk, btrfs device add,
btrfs filesystem resize, btrfs device scan,
btrfs device scan --forget, cryptsetup luksHeaderBackup,
cryptsetup status, blkid, JSON parses, journal writes,
pool.json saves, sysfs reads) does not warrant a row.
Rendering uses status_tag::status_line(StatusTag::Wait, ...)
against color_enabled_for_stderr() so plain stderr captures
contain unwrapped [wait] bytes and TTY output picks up the gray
ANSI tag. Why →
Implementation workflow and conventions are in AGENTS.md at the repo root.
Decision: btrfs RAID1
Principle: btrfs RAID1
Context
The NAS needs checksumming (bit rot detection), self-healing (automatic repair from redundant copy), and dynamic drive pooling (add/remove drives without reformatting). The filesystem sits on top of LUKS.
Options considered
ZFS raidz
- Checksumming + self-healing + RAID. Mature and well-tested.
- Rejected: out-of-tree kernel module. Licensing conflict means it can never be mainlined. NixOS supports it but it’s a second-class citizen — kernel updates can break the module, and the build dependency is heavy.
btrfs RAID5/6
- Same benefits as RAID1 with less overhead (parity instead of mirroring).
- Rejected: not production-ready. The write-hole bug has been a known issue for years. Data loss reports exist. The btrfs wiki explicitly warns against it.
SnapRAID + mergerfs
- Parity-based protection with independent drives. ~75% space efficiency with 3+1.
- Rejected: no auto-healing. SnapRAID syncs on a schedule (e.g., nightly). Bit rot between syncs is undetected. No checksumming on read. Drives are independent ext4 — good for recovery but no real-time protection.
btrfs RAID1
- Checksums every block on read, heals from the RAID1 copy automatically. Dynamic pool —
btrfs device add/removeat any time with any size drive. In-kernel, first-class NixOS support. Simple stack: LUKS + btrfs. - Accepted.
Decision
btrfs RAID1. The 50% space overhead is accepted as the cost of real-time auto-healing with a simple, in-kernel stack.
Tradeoffs accepted
- 50% space overhead — 3x 12TB = ~18TB usable. Parity schemes would give ~24TB.
- Fixed 2-way redundancy — btrfs RAID1 keeps exactly 2 copies of every block, regardless of pool size. A 3- or 4-drive pool tolerates one drive failure, the same as a 2-drive pool. Additional drives buy usable capacity, not extra fault tolerance. Higher-redundancy profiles (RAID1C3, RAID1C4) exist in btrfs but are not used by braid — the product’s redundancy story is “tolerate one drive failure.”
- No drive independence — drives are part of a btrfs pool, not individually mountable. Recovery requires a working btrfs toolchain.
- Rebalancing cost — adding or removing a drive triggers a balance operation that can take hours on large pools.
- Incremental growth — start with 1 drive (single profile, no redundancy), add a second to convert to RAID1. This is a feature, not a tradeoff — data is available immediately, protection comes when the second drive arrives.
Replacement strategy
Device replacement always uses btrfs replace start, including when the source device is missing. btrfs replace start <devid> supports replacing by devid when the source is unavailable, rebuilding from RAID1 mirrors. This is preferred over the alternative btrfs device add + btrfs balance + btrfs device remove approach because:
- No degraded balance: btrfs docs explicitly warn against balancing a degraded filesystem to lower redundancy.
btrfs replaceavoids this entirely. - Devid preservation: the new device inherits the old devid, keeping the pool topology stable.
- Single operation: one
btrfs replace startcall vs. three separate commands with partial-failure risk.
braid remove-missing is retained for cleanup only (forgetting stale device entries), not for replacement.
When braid blocks a live replacement because the pool has missing devices, the intended next step is repairing the missing device via braid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...> (the missing devid auto-resolves from --old), not forgetting it.
See
cli/src/cmd.rs—base_mount_options()and the btrfs mount invocationtests/storage/btrfs-heal.nix— validates auto-healingtests/storage/btrfs-grow1.nix,tests/storage/btrfs-shrink.nix— validates dynamic pooling
Superseded by 017-runtime-disk-membership.md.
Decision: Config-First Workflow
Principle: CLI-owned membership (successor)
Context
NixOS is declarative — nixos-rebuild switch should describe the system’s desired state. But cryptsetup luksFormat and btrfs device add are destructive one-shot operations that cannot be made idempotent. Re-running them would destroy data.
Options considered
- Fully declarative — module handles formatting in an activation script. Simple but catastrophic if re-run.
- Fully imperative — script manages everything, module reads live state. Works but creates config drift (disk formatted but not in NixOS config; pool unlock breaks).
- Config-first hybrid — declare disk in NixOS config (source of truth), rebuild (creates LUKS entries that fail gracefully), then run imperative script to format. Script refuses undeclared disks.
Decision
Option 3. The NixOS config is the source of truth. The script is a one-shot executor that reads from it.
Workflow
- Add disk to
braid.disks nixos-rebuild switch— module exports/etc/braid/config.json, creates LUKS entries (which fail gracefully since disk isn’t formatted yet)sudo braid init-disk /dev/disk/by-id/...— reads config, verifies disk is declared, formats LUKS (explicit, one-shot)sudo braid apply— opens LUKS, adds to btrfs pool, balances to RAID1 if applicable- Next reboot auto-unlocks
Config export
The module writes /etc/braid/config.json via environment.etc. This is the single Nix→runtime bridge. All CLI tools read it by default. The file is built at nixos-rebuild time and is read-only at runtime.
Config drift prevention
The script refuses to format disks not listed in braid.disks. Error message tells the user exactly what to add and which commands to run. This ensures every formatted disk has a corresponding LUKS entry for pool unlock.
Symmetric guards
Config-first applies to all pool operations, not just add. The guard works in both directions:
braid init-diskrefuses disks not inbraid.disksbraid applyremoves disks from pool when they are no longer inbraid.disks
Remove workflow: remove disk from braid.disks → nixos-rebuild switch → sudo braid apply. See 007-disk-pool-management.md for full spec.
Constraint
Two-step process (rebuild + run script) instead of a single rebuild. This is the minimum viable approach given that LUKS formatting is destructive.
Revisit trigger
If NixOS ever gets a formatDevice option type that can safely express one-shot destructive operations, this deviation can be revisited.
See
modules/braid/options.nix—braid.disksoption definitionmodules/braid/storage.nix— config export and LUKS entry generationcli/src/— Rust CLI (init-disk,plan,apply,status)archive/design-docs/1-nixos-best-practices.md— original best practices analysis (preserved in git history; last present at commit9df91f9)
Decision: Resilient by Default
Principle: Resilient by default
Context
The OS lives on an internal SSD. Data drives are separate. Nothing about the data drives — bad config, dead drive, unplugged cable — should prevent the system from booting. The data pool is an external resource, like a network mount. The module tries to bring it up, but if it fails, the box is still a working Linux machine you can SSH into and fix.
Options considered
- Hard dependencies — LUKS required, mount required. Any failure blocks boot. Simple but means a dead drive = unreachable NAS.
- Degraded toggle — add an option like
braid.allowDegraded = true. Default to hard failure, opt in to resilience. Adds complexity and a wrong default. - Resilient by default —
nofail,wantseverywhere. Zero cost when healthy, graceful in every failure case. No toggle. Degraded mounts require explicit opt-in (--allow-degradedorautoUnlock.allowDegraded) to prevent silent zero-redundancy operation.
Decision
Option 3. Resilience is the default, not an option.
Implementation
LUKS unlock is strictly stage-2 — braid-unlock or braid-auto-unlock opens LUKS and mounts the pool. The module does not generate boot.initrd.luks.devices, data-pool fileSystems entries, or LUKS device declarations. The pool is brought online entirely by the CLI at runtime.
Resilience mechanisms:
- No boot-blocking mount units: The module generates no data-pool
fileSystemsor LUKS entries. The CLI (braid unlock) opens LUKS and mounts btrfs directly with a plainmountcall, so nothing referencing data drives can block boot. Mounting outside systemd also sidesteps theSYSTEMD_READY=0udev quirk (systemd/systemd#36886): a missing btrfs member can mark surviving devices not-ready and stall a systemd-initiated mount — the exact failure resilience-by-default exists to prevent. Related coverage:tests/repro/udev-missing-disk-{io,idle}.pyexercise udev events when a member disappears from an already-mounted pool, characterizing disappearance signals rather than theSYSTEMD_READY=0mount-gating path. (The one build-timefileSystemsentry is the optionalautoUnlockUSB-key mount at/run/braid-key/mnt, markednoauto/nofailso it never blocks boot and references the key device, not the pool.) - Degraded mount: Requires explicit
--allow-degraded(orautoUnlock.allowDegradedfor unattended use) — braid refuses to silently mount with zero redundancy.
Three-tier failure model
| Scenario | What happens | User sees |
|---|---|---|
| All drives healthy | Normal boot | Everything works |
| One drive dead | braid unlock refuses by default; user must pass --allow-degraded or configure autoUnlock.allowDegraded | Pool stays locked until explicit opt-in |
| All drives dead / no pool.json | braid unlock fails (no devices to probe) | System boots, SSH works, no /mnt/storage |
Identity enforcement
braid unlock uses authoritative pool membership from pool.json and probes only those configured members. --allow-degraded only bypasses degraded-mount refusal; it does not change which disks are considered pool members.
Constraint
This is not configurable. There is no braid.resilient option. Every braid deployment gets resilient boot.
See
modules/braid/storage.nix—braid-online.service,braid-pool.targettests/module/— module tests validate boot with all drives healthyarchive/plans/test-boot-degraded.md— original plan and research (preserved in git history; last present at commit9df91f9)
Decision: Single Passphrase
Principle: Single passphrase
Context
braid unlock and braid add prompt for a passphrase that unlocks all LUKS devices. If each drive had a different passphrase, the user would need to type N passphrases on every unlock. The UX must be: one passphrase, all drives unlock.
Options considered
- Shared keyfile on boot disk — store a keyfile on the SSD, encrypt the SSD with a passphrase. Unlocking the SSD exposes the keyfile, which unlocks data drives. More complex boot chain, keyfile is at-rest on disk.
- Same passphrase, no enforcement — tell users to use the same passphrase. They’ll forget or mistype. Boot breaks silently.
- Same passphrase, enforced at format time —
braid addverifies the passphrase matches relevant existing pool members before formatting. Catches mismatches immediately.
Decision
Option 3. Enforcement at format time.
How it works
- First disk: prompt for passphrase twice (confirm match). Standard new-passphrase flow.
- Subsequent disks: prompt once, then verify every reachable existing LUKS device that will remain in or enter post-operation pool membership via
cryptsetup luksOpen --test-passphrase. Fresh-format disks are excluded because they have no existing slot 0. The live-replace source is excluded when other retained members exist, so a divergent slot 0 on the disk being replaced does not block its own replacement. If verification fails, refuse to proceed with a clear error.
Finding a verification target
The CLI reads which devices are in the btrfs pool and verifies the supplied passphrase against each relevant underlying LUKS device before opening or mutating disks. The same result-membership rule applies to keyfile credentials used by mount, unlock, and recover: every planned LUKS target is verified before any mapper is opened with that credential.
For add, this widened preflight changes one mixed-failure precedence case. If a non-first pool member has a divergent slot 0 and a closed PresentLuks { mapper_open: false } candidate would later surface a foreign-FSID or no-btrfs identity error during execute Pass 1, the pool-member credential error now wins. The old identity-first ordering in that shape came from the former first-disk-only verify and was not a documented invariant. A divergent slot 0 is a pool-wide integrity issue that affects future operations; surfacing it before the candidate’s one-off selection error is intentional.
Identity errors found during planning are unchanged. The add work-plan builder still validates every PresentLuks candidate’s braid label and classifies already-open candidates before AddPlan::execute runs, before any passphrase read, and before the widened verify.
Scope
This decision governs the shared passphrase: one passphrase, enrolled in LUKS key slot 0 on every pool disk, enforced at format time. Additional unlock mechanisms (USB keyfiles, TPM, etc.) are orthogonal — they use separate LUKS key slots and do not weaken or replace the passphrase requirement.
See
cli/src/— passphrase prompt and verification logic in the Rust CLIdesign-docs/1-braid-add-disk.md— original script design (preserved in git history; last present at commit4112e57)
Decision: Sane Defaults
Context
Braid should protect the user’s data without requiring them to read through NixOS options to find features worth enabling. If a setting is something every NAS should have, braid should turn it on automatically.
The guiding question: would a knowledgeable admin always enable this? If yes, braid enables it by default.
Decision
Braid sets opinionated defaults two ways: lib.mkDefault for simple pass-through defaults on stable NixOS options, and a braid.* wrapper option when the feature is inside braid’s product boundary and benefits from lifecycle control, discoverability, or a unified config surface — even if the mapping is 1:1. The two cases below say which applies.
When to use mkDefault (don’t wrap)
Use lib.mkDefault to set an underlying NixOS option directly when:
- The NixOS option is stable and well-known — wrapping it adds no clarity.
- The meaning doesn’t change if braid’s internals change.
- The mapping is 1:1 and braid doesn’t need lifecycle control.
The user overrides by setting the NixOS option in their own config. mkDefault gives way automatically.
When to wrap in a braid option
Create a braid.* option when:
- One braid option maps to many underlying options — e.g.,
braid.autoUnlocksets afileSystemsmount entry for the USB key, abraid-auto-unlock.service,systemd.tmpfilesrules, and assertions. - The underlying tech could change — the abstraction survives an implementation swap.
- The raw option requires braid-specific context — e.g., the pool membership encodes LUKS + mapper naming conventions. Exposing the raw options would require the user to understand braid’s internals.
- The mapping is non-obvious or must stay in sync — e.g., if braid supported multiple pools, scrub
fileSystemswould need to track all mount points automatically. - Braid needs lifecycle control — the feature must be tied to the pool’s online state, not the host system’s always-on timers. Example:
braid.autoScrubwraps a 1:1 mapping but needs the timer bound tobraid-online.servicesoPersistent=truecatches up missed scrubs after unlock.
Defaults applied
| Setting | Value | Rationale |
|---|---|---|
braid.autoScrub.enable | true | Scrub detects bit rot before it compounds. Every NAS should do this. Wrapped in a braid option for lifecycle binding to braid-online.service. |
braid.autoScrub.interval | "monthly" | Btrfs community consensus. Weekly is aggressive for spinning disks; quarterly risks undetected corruption on a small RAID1. TrueNAS defaults to weekly (ZFS); Synology doesn’t enable it by default. Monthly is the sweet spot. |
braid.poolAccessGroup | "storage" | Mount root set to root:storage 2770. Users in the group can read/write the mount root. Setgid ensures new entries inherit the group. Same pattern as TrueNAS/OMV. Does not override per-file umask. |
Alternatives considered
Wrap scrub in braid.autoScrub
Accepted (reversed). Initially rejected because wrapping a 1:1 mapping seemed like unnecessary indirection. Reversed because braid needs lifecycle control over the scrub timer: the timer must be bound to braid-online.service so it only runs while the pool is online, and Persistent=true can catch up missed scrubs on unlock. The nixpkgs services.btrfs.autoScrub timer fires on calendar boundaries regardless of pool state, causing silent failures when the pool is locked.
Don’t enable scrub by default
Rejected. This is what Synology and Unraid do — scrub is opt-in. Users who don’t know about scrub never enable it. Braid’s philosophy is that data integrity features should be on by default.
Weekly scrub (TrueNAS default)
Rejected. TrueNAS runs ZFS on always-on servers. Braid targets home NAS with spinning disks where weekly scrubs add unnecessary wear and noise. Monthly catches bit rot well before it can compound across a 2-3 drive RAID1.
See
modules/braid/options.nix— declares the option defaults (braid.autoScrub,braid.poolAccessGroup)modules/braid/storage.nix— realizesbraid.autoScrubinto the scrub lifecycle units (braid-scrubtimer/service andbraid-scrub-resume-trigger), all bound tobraid-online.servicecli/src/online_state.rs—mark_online()applies the mount-root permissions frombraid.poolAccessGroup(root:<group> 2770)- Resilient by default — related philosophy: protect by default, no toggles
Decision: NixOS-native
Context
Braid is a NixOS module. It only targets NixOS — no portability goal for other distros, container runtimes, or generic Linux. Every design decision can assume the full NixOS ecosystem is available.
Decision
Braid follows NixOS module conventions — same option types, module patterns, and idioms as nixpkgs. When in doubt, nixpkgs is the tiebreaker.
What this means in practice
- Options use standard
lib.mkOptiontypes (lib.types.listOf,lib.types.attrsOf,lib.types.submodule, etc.) — not custom validation or string parsing. - Activation uses systemd units, not custom init scripts or cron jobs.
- Dependencies use systemd ordering (
after,wants,requires), not polling or sleep loops. - Defaults use
lib.mkDefault/lib.mkForcepriority, not conditional logic. - Config generation uses NixOS module merge semantics — not imperative file templating.
- No portability shims. Use NixOS mechanisms (
boot.initrd.network,environment.etc,systemd.services) directly.
Tiebreaking
When two approaches both work:
- Check how nixpkgs modules handle the same problem.
- If no precedent, prefer whichever composes better with the NixOS module system.
Alternatives considered
Support other distros via abstraction layers
Rejected. Braid’s value comes from deep NixOS integration — declarative disk config, reproducible builds, VM-tested infrastructure. The target user already runs NixOS.
Use generic Linux tooling where possible
Rejected. “Generic” means reimplementing what NixOS already provides (shell scripts instead of systemd units, config files instead of NixOS options) — more maintenance, no atomicity or rollback.
See
- Sane defaults — use
lib.mkDefaultinstead of braid-specific wrappers - Config-first workflow — NixOS rebuild as the entry point
Superseded by 012-intent-cli.md and 017-runtime-disk-membership.md.
Decision: Disk Pool Management
Principle: CLI-owned membership
Context
braid-add-disk exists and is tested. The pool still needs graceful disk removal, status reporting, and a clear replace workflow. All operations must follow the same config-first pattern: edit braid.disks → nixos-rebuild switch → run CLI tool.
Principle: config-first applies symmetrically
Config-first is not just for adding disks. Every pool mutation follows the same workflow:
- Add: declare disk in
braid.disks→ rebuild →braid-add-disk - Remove: remove disk from
braid.disks→ rebuild →braid-remove-disk - Replace: remove dead disk + add replacement in
braid.disks→ rebuild →braid-add-disk
Symmetric guards enforce this:
braid-add-diskrefuses disks not in configbraid-remove-diskrefuses disks still in config
braid-remove-disk spec
Three-tier logic
- Target mapper exists and is open, verified to map to the requested by-id disk → graceful
btrfs device remove /dev/mapper/xxx(migrates data off the device) - Target is absent/unopenable and pool shows a missing device →
btrfs device remove missing - Otherwise → fail with clear diagnostic
Graceful remove is preferred when possible. It avoids relying on RAID1 reconstruction and eliminates ambiguity if more than one device is missing.
LUKS cleanup
After btrfs remove, cryptsetup close the mapper. Best-effort:
- Success → print “disk fully released” (safe to physically pull)
- Failure (busy) → print actionable next steps (
lsof/fuser+ retry), exit non-zero
No passphrase required
Remove does not need a passphrase. The disk is already unlocked or already gone. Root access + config guard + typed confirmation is sufficient.
Confirmation
Normal remove (pool stays RAID1 with 2+ disks):
Type 'remove this disk' to confirm:
Removing would drop below 2 disks (losing redundancy):
WARNING: This leaves 1 disk with no RAID1 redundancy.
A single disk failure will cause data loss.
Type 'remove this disk without redundancy' to confirm:
Warn but allow dropping to 1 disk — consistent with the single-disk start story.
Reboot-in-between safety
If the user reboots between nixos-rebuild switch (which removes the LUKS entry) and running braid-remove-disk, the disk won’t auto-unlock. This is safe: principle #1 (resilient boot) ensures the system boots and is reachable via SSH. The pool requires explicit --allow-degraded (or autoUnlock.allowDegraded) to mount degraded. The CLI handles both paths (tier 1 if disk is still somehow open, tier 2 if it’s absent).
braid-status spec
Default output
Pool health summary: drive count, RAID profile, total/used/free capacity, degraded/missing state, last scrub result. Per-disk detail: model, serial, mapper name, btrfs devid, read/write/corruption error counters, LUKS UUID, present/missing state.
--json
Machine-readable output for monitoring and automation.
Replace workflow
Replace uses braid-add-disk, which already auto-evicts missing devices during rebalance.
Workflow:
- Remove dead disk from
braid.disks, add replacement nixos-rebuild switchsudo braid-add-disk /dev/disk/by-id/<new-disk>
Auto-evict is specifically for missing/dead devices. Planned removal of a healthy disk uses braid-remove-disk.
Future vision
Document only — do not build yet.
Unified CLI: braid plan (dry-run diff of config vs live state), braid apply (execute with checkpoints and resumability), braid status, braid replace-disk <old> <new>.
Phased roadmap:
- Ship
braid-remove-diskandbraid-status(solid primitives) - Read-only planner (
braid plan) - Executor with checkpoints (
braid apply) - First-class
braid replace-disk braid-status --json
Nix config remains source of truth throughout. The workflow evolves from edit → rebuild → script to edit → rebuild → plan → apply, but the principle is unchanged.
CLI shape
Separate scripts (braid-add-disk, braid-remove-disk, braid-status) — not a unified CLI yet. The unified braid command is future work that depends on proven primitives.
See
modules/braid/options.nix—braid.disksoption definitionmodules/braid/storage.nix— config export and LUKS entry generationdocs/design/decisions/002-config-first-workflow.md— original config-first decision
Superseded by 012-intent-cli.md.
Decision: Unified CLI with Plan/Apply
Principle: CLI-owned membership (successor)
Context
Braid had three standalone scripts (braid-add-disk, braid-remove-disk, braid-status). Each handled one operation with its own validation, pool probing, and confirmation flow. The config-first workflow (edit config → rebuild → run script) was sound, but operators had to choose the right script and remember its flags. All three are now replaced by the unified Rust CLI.
A unified braid command with plan (dry-run diff) and apply (execute with checkpoints) replaces the multi-script mental model with one flow: edit config → rebuild → plan → apply.
Options considered
- Keep separate scripts — add
braid-planas a fourth script. Simple but doesn’t unify the execute path or add checkpoint/resume. - Go binary — full rewrite in Go. Better for complex state machines, but high migration risk and slower delivery for equivalent behavior.
- Bash+jq unified script — single
braiddispatcher with subcommands. Reuses existing tested patterns. JSON plan/checkpoint formats work with jq.
Decision
Option 3. Initial implementation was bash+jq. Now replaced by Rust CLI (cli/).
Architecture
Rust CLI (cli/src/) with subcommand dispatcher:
braid init-disk <by-id> [--force] [--config <path>]— destructive one-shot: LUKS format a declared disk. Requires explicit operator intent. Never called fromapply.braid plan [--json] [--allow-remove-missing] [--allow-remove-ambiguous] [--config <path>]— read-only diff: desired state (config) vs live state (LUKS/btrfs/mounts). Outputs action list with status (applicable/blocked), warnings, and blocked reasons.braid apply [--resume] [--allow-remove-missing] [--allow-remove-ambiguous] [--config <path>]— executes plan with checkpoint persistence.--resumecontinues from/var/lib/braid/apply-state.json. Never performsluksFormat.braid status [--json] [--config <path>]— pool health summary with per-disk detail (replacesbraid-status).braid doctor [--json] [--config <path>]— run diagnostic checks against config and pool state (config file, schema, permissions, declared disks, data/metadata profile consistency). Reports ok/warn/fail per check.
Packaged via Crane + makeWrapper in flake.nix.
Hard boundary
cryptsetup luksFormat is forbidden in the plan and apply code paths. Only init-disk may invoke luksFormat. See 009-safe-by-construction-reconciliation.md.
Plan status model
Plan JSON includes:
status:applicable(can be executed) orblocked(requires operator action first)blocked_reasons[]: list of reasons the plan cannot proceed (e.g.,INIT_REQUIRED,IDENTITY_AMBIGUOUS_ABSENT_DISK)warnings[]: non-blocking issues (e.g.,DISK_ABSENT_SKIPPED,INIT_REQUIRED,POOL_DEGRADED)confirmations[]: actions requiring explicit operator confirmation (e.g., redundancy loss). When multiple confirmations are required, provide all phrases semicolon-separated inBRAID_CONFIRM(e.g.,BRAID_CONFIRM='phrase one;phrase two'). Whitespace around semicolons is trimmed.
Plan/apply state machine
braid planproduces a JSON plan (action list with types, targets, preconditions)braid applyruns the planner internally, writes checkpoint, executes actions in order- Each action updates the checkpoint atomically (write tmp + mv)
- On success: checkpoint moves to
/var/lib/braid/history/<plan_id>.json, active file removed - On failure: checkpoint stays for
--resume --resumeverifies config hash matches before continuing- On resume, absent action targets fail with
RESUME_TARGET_MISSING(strict in-flight integrity)
Action types
OPEN_LUKS— open existing LUKS device (non-destructive)ADD_DISK_BTRFS_ADD— add mapper to btrfs poolBALANCE_TO_RAID1— convert pool to RAID1 profileREMOVE_DISK_GRACEFUL— btrfs device remove (data migrates)REMOVE_DISK_MISSING_EXPLICIT— btrfs device remove missing (requires--allow-remove-missing+BRAID_CONFIRM)CLOSE_LUKS_MAPPER— cryptsetup closeVERIFY_POOL_HEALTH— confirm pool state matches expectationsVERIFY_EXPECTED_DISK_SET— confirm pool members match config
Checkpoint schema
Active: /var/lib/braid/apply-state.json
History: /var/lib/braid/history/<plan_id>.json (last 20 retained)
Backward compatibility
braid-add-disk is now an error stub that directs operators to braid init-disk + braid apply. braid-status is deleted (replaced by braid status). braid-remove-disk remains as a standalone legacy script (not yet ported to the Rust CLI).
Constraint
Two commands (plan then apply) instead of one. This is intentional — deterministic dry-run before mutation prevents accidents.
See
docs/design/decisions/002-config-first-workflow.md— config-first principle this builds ondocs/design/decisions/009-safe-by-construction-reconciliation.md— destructive boundary principledocs/design/decisions/007-disk-pool-management.md— existing pool management speccli/src/— Rust CLI implementation
Superseded by 012-intent-cli.md.
Decision: Safe-by-Construction Reconciliation
Principle: Safe-by-construction operations
Context
braid apply originally mixed two fundamentally different operation classes:
- One-time destructive initialization —
cryptsetup luksFormatdestroys all data on the target device. It is not idempotent. Running it twice destroys a working LUKS volume. - Repeatable reconciliation —
cryptsetup luksOpen,btrfs device add/remove, balance, verify. These are safe to run repeatedly.
The structural hazard: state ambiguity (temporarily absent disk vs truly new disk) can route execution toward formatting. A disk that was unplugged and replugged could be misidentified as “new” and reformatted, destroying data.
Options considered
- Registry-based — track disk identity in a persistent registry to distinguish “new” from “returning”. Adds hidden state, drift risk, and recovery complexity.
- Config flags — add lifecycle state (
new/existing/replace) to NixOS config. Violates declarative end-state principle — config becomes imperative. - Structural separation — move destructive operations to a separate command that requires explicit operator intent.
applyphysically cannot format.
Decision
Option 3. Hard boundary between destructive initialization and safe reconciliation.
Architecture
braid init-disk <by-id>— the only command that may callcryptsetup luksFormat. Requires the disk to be declared in config, not already LUKS-formatted (unless--force), and not in an active pool. Enforces single-passphrase invariant.braid plan/braid apply— reconciliation only. EmitsOPEN_LUKS(non-destructive open), neverluksFormat. Non-LUKS disks produce anINIT_REQUIREDwarning telling the operator to runinit-diskfirst.
Hard boundary enforcement
cryptsetup luksFormat is forbidden in the plan/apply code path. This is verified by:
- Code inspection — no
luksFormatcall exists incompute_plan(), executor dispatch, or any function reachable fromcmd_apply(). - Test assertion —
braid-apply.pyincludes an explicit test thatapplynever containsluksFormat. - The
ADD_DISK_LUKS_FORMAT_OPENaction type has been removed from the planner and executor entirely.
Missing-disk policy
Absent configured disks are skipped with a DISK_ABSENT_SKIPPED warning. The plan remains applicable — other safe operations proceed. This prevents a temporarily disconnected disk from blocking all reconciliation.
Missing pool devices (devices in btrfs but not in config) require explicit operator intent to evict: --allow-remove-missing flag plus BRAID_CONFIRM='remove missing device from pool' environment variable. This prevents accidental eviction of temporarily absent disks.
Device identity is established by LUKS UUID, not by path or mapper name. When a config disk is absent, its UUID is unknowable, creating identity ambiguity for removal decisions. If the planner wants to remove a pool device but cannot verify it doesn’t match an absent config disk, the plan is blocked with IDENTITY_AMBIGUOUS_ABSENT_DISK. The operator can override with --allow-remove-ambiguous plus BRAID_CONFIRM='remove despite ambiguous identity'.
Resume strictness
Fresh apply is tolerant of absent disks (skip + warn). But checkpointed in-flight actions are strict: if a pending action’s target becomes absent during --resume, the apply fails with RESUME_TARGET_MISSING. The checkpoint is preserved for retry after the target is restored.
Constraint
Two commands (init-disk + apply) instead of one. Operators must explicitly initialize each new disk before reconciliation can include it. This is the minimum viable approach given that LUKS formatting is destructive and non-idempotent.
Revisit trigger
If NixOS or LUKS ever provides an idempotent “ensure formatted” primitive that is safe to run on an already-formatted device, the separation can be revisited.
See
cli/src/— Rust CLI (init-disk,plan,apply,status)- 002-config-first-workflow.md — config-first principle
- 008-unified-cli.md — plan/apply architecture and action types
Decision: Toolchain pinning
Context
Braid’s parser-critical runtime tools (btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, ethtool) are parsed by the Rust CLI. Output formats change between tool versions – a flake update to nixpkgs-unstable could silently break parsers. Generic helpers (coreutils, systemd) are used for basic system operations and are outside braid’s parser contract. Browse has one tolerant UI-only systemd exception: it parses systemctl list-units --output=json for a picker and falls back to raw output on parse failure.
Decision
Pin flake.nix to a specific NixOS stable release (nixos-26.05). Pin only parser-critical tools — those whose output braid parses or whose behavior is part of braid’s correctness model. Generic helpers come from the consumer’s system package set.
How it works
- Flake input:
nixpkgs.url = "github:NixOS/nixpkgs/nixos-26.05"— braid’s own pinned channel, and the source of parser-critical tool packages unless the consumer redirects braid’snixpkgsinput (see the follows note below). - Module options:
braid.packages.*(cryptsetup, btrfsProgs, utilLinux, nut, smartmontools, ethtool) default to braid’snixpkgsflake input but can be overridden per-system. - PATH wrapping: The wrapper injects
cfg.packages.*into PATH. Generic helpers (coreutils, systemd) are resolved from the consumer’spkgs, not pinned. - Two wrapping sites: flake.nix wraps with
pkgs.*defaults (fornix runand tests); the module wrapscfg.packagewithcfg.packages.*(for deployed NixOS systems where package options may be overridden).
Consumer follows decides the actual source
nixosModules.default builds the braid.packages.* defaults with import self.inputs.nixpkgs – braid’s nixpkgs flake input, instantiated cleanly (no consumer overlays). Whether the consumer sets braid.inputs.nixpkgs.follows = "nixpkgs" decides where the pinned tools actually come from.
The recommended default is no follows. With no follows, braid’s nixpkgs input stays on its pinned nixos-26.05, so the pinned tools resolve from braid’s release channel and braid-cli-unwrapped matches the exact binary the release cache publishes – a cache hit instead of a from-source rebuild on the NAS. ADR 029 is the authoritative home for that cache-path-identity rationale; the short version is that follows rebuilds braid against the consumer’s nixpkgs, changing the store path and forcing a recompile.
follows = "nixpkgs" is a valid advanced opt-out (smaller closure via nixpkgs dedup), but it redirects braid’s nixpkgs input to the consumer’s nixpkgs, so the pinned tools then resolve from the consumer’s nixpkgs. The pin therefore guarantees stable parser output only while the consumer’s nixpkgs stays on the same NixOS stable release braid targets (currently nixos-26.05). Within one stable release tool output formats change only for security fixes, so a consumer aligned on braid’s release is safe; a consumer who bumps nixpkgs ahead of braid moves the storage toolchain with it and re-introduces the parser-drift risk this decision otherwise prevents. If you do opt into follows, mitigate by keeping nixpkgs aligned with braid’s release or pinning braid.packages.*.
Operational escape hatch
Parser-critical tools are pinned by default to the flake’s nixpkgs release, but braid.packages.* overrides are intentional – operators may need a newer upstream version for urgent bugfixes or security patches before braid’s next nixpkgs bump. The override takes precedence. Operator-set braid.packages.* overrides sit outside braid’s committed parser contract: the standard fixture-capture and golden-test recipes build fixed flake checks against the flake’s pkgs, so they do not validate an arbitrary override. Treating an override as supported requires a maintainer to reproduce the fixture-refresh workflow under a temporary local input swap (e.g. --override-input nixpkgs on the capture/test commands, or a local flake edit) at the override’s package version, then re-run just test-rust against the resulting fixtures. Operators who skip this step are running unverified parser inputs.
Classification guideline
Pin when: braid parses the tool’s output, or the tool’s behavior is part of braid’s correctness/safety model.
Use system pkgs when: the tool is a generic helper, braid doesn’t parse its output as a correctness contract, and version drift is unlikely to affect correctness. The Browse Systemd picker is a UI-only exception because it parses systemctl list-units --output=json tolerantly and disables drill-in on parse failure.
New runtime dependencies must be classified into one of these two groups when added.
| Tool | Pinned by default? | Overrideable? | Reason |
|---|---|---|---|
| btrfs-progs | Yes | Yes (braid.packages.btrfsProgs) | Output parsed by nom combinators and serde JSON |
| cryptsetup | Yes | Yes (braid.packages.cryptsetup) | Output parsed by nom combinators |
| util-linux (lsblk) | Yes | Yes (braid.packages.utilLinux) | lsblk JSON output parsed by serde |
NUT (upsc) | Yes | Yes (braid.packages.nut) | upsc key: value output parsed by parse_upsc for preflight safety and operator visibility |
| smartmontools | Yes | Yes (braid.packages.smartmontools) | smartctl --json output parsed by parse_smartctl |
| ethtool | Yes | Yes (braid.packages.ethtool) | Wake-on: line parsed by the doctor wake_on_lan check |
| coreutils | No — system pkgs | No option | chown/chmod/realpath/stat — output not parsed |
| systemd | No — system pkgs | No option | systemctl/ask-password commodity behavior; Browse’s list-units JSON picker is tolerant UI-only, not parser-critical |
Upgrading tools
A nixpkgs bump can move parser-critical tools to new output formats, so an upgrade must refresh fixtures and re-run every parser-validation lane – not just confirm tool provenance. These steps mirror the canonical sequence in dev/overview.md (“Refresh fixtures and run tests”); keep the two in sync.
- Bump the nixpkgs input to the next stable release and run
nix flake update nixpkgs. - Refresh fixtures:
just capture-all-fixtureswrites golden files undercli/tests/fixtures/nixos-<release>/(withupsc/holding thecapture-ups-fixturesoutputs).just capture-all-fixtures-unstableis the unstable-lane mirror. - Run the parser-validation lanes, updating parsers/tests for any output that
changed:
just test-rust– golden-fixture parser tests.just test-parsers– live-tool parser canary.just test-vm– VM suite. Itstool-versionscheck verifies provenance: each pinned tool resolves to a/nix/store/path on the VM’s PATH and its self-reported version matchespkgs.<tool>.versionfrom the same evaluation. Provenance only –tool-versionsdoes not detect that nixpkgs moved a tool to a new version (both sides advance together), so the fixture and parser tests above are the actual drift gate. Run it alone withjust test-vm tool-versionsfor a quick provenance-only check.
NUT specifically: parse_upsc depends on the key: value shape emitted by pkgs.nut’s upsc client (see reference/nut/clients/upsc.c). A nixpkgs bump that touches networkupstools triggers the same fixture-refresh obligation as the other pinned tools – run just capture-ups-fixtures and just test-rust before merging. The braid-status-ups check under just test-parsers is the live-tool mirror of the golden fixtures.
ethtool specifically: wake_on_lan depends on the Supports Wake-on: and Wake-on: lines emitted by pkgs.ethtool. VM virtio NICs do not provide useful Wake-on-LAN state, so there is no live fixture-capture lane; parser coverage is hand-authored in Rust unit tests, and wrapper provenance is covered by the tool-versions and braid-auto-suspend VM tests.
Alternatives considered
BRAID_*_BIN environment variables
Rejected. Adds a second resolution mechanism alongside PATH. Every callsite would need to check the env var, falling back to PATH. More complexity, same result — Nix already controls PATH.
Absolute paths in Rust (no PATH at all)
Rejected. Would require threading Nix store paths into the Rust binary at build time (via build.rs or env vars). Fragile and non-standard — NixOS convention is PATH wrapping via makeWrapper.
Stay on nixpkgs-unstable
Rejected. Unstable channel updates tool versions without notice. A routine nix flake update could change btrfs-progs output format and break parsers silently. Stable releases change only for security fixes.
Pin all runtime tools (blanket pinning)
Previously active, now superseded. Blanket pinning created unnecessary closure duplication for generic helpers (jq, coreutils) that braid does not parse. The braid.packages.coreutils option was also inconsistently wired — storage.nix used pkgs.coreutils directly, bypassing the option. Selective pinning is simpler and honest about what braid actually depends on.
See
- NixOS-native — follow NixOS conventions (PATH wrapping via makeWrapper)
- Release process – cache-path-identity rationale for the no-follows default
- Principle 10 in principles.md
Superseded by 012-intent-cli.md.
Two-Phase Apply (LUKS Pre-Phase)
Context
After a reboot, all LUKS mappers are closed and the btrfs pool is unmounted. The planner runs after probe, which sees no open mappers and no mounted pool. This causes two problems:
-
Misleading plan display:
braid planshowsmkfs.btrfs -f -d single -m dup (may run)for disks that are actually returning pool members. The execute-time superblock check prevents data loss, but the plan output is alarming for routine re-mounts. -
Mount failure after reboot: Per-device
btrfs device scan <device>doesn’t reliably assemble multi-device pools. Even after opening all LUKS mappers and scanning each device individually,mountcan fail with “missing members” because the kernel’s btrfs subsystem hasn’t been told about all members atomically.btrfs device scan(no arguments) scans all block devices and reliably assembles multi-device pools.
Decision
Move LUKS opening into a pre-phase that runs before plan generation. The sequence changes from:
old: checkpoint check → probe → plan → checkpoint → execute
new: checkpoint check → luks_prephase → probe → plan → checkpoint → execute
The luks_prephase function:
- Opens closed LUKS mappers — iterates config disks, skips absent and already-open, reads passphrase from
--passphrase-stdinor TTY lazily (only when the first closed mapper is encountered). - Scans all — runs
btrfs device scan(no arguments) to register all open btrfs members with the kernel. - Mounts pool — if the mount point is not already mounted, finds the first open mapper and attempts mount. Missing-members errors are tolerated (not all disks may be available). Hard errors propagate.
After the pre-phase, probe sees accurate state: all available mappers are open, the pool is mounted (if all members are present) or truly empty (bootstrap). The plan has no OPEN_LUKS actions for available disks, and is_bootstrap is accurate.
Resume with closed LUKS
If braid apply --resume detects closed LUKS mappers (device exists but /dev/mapper/<name> does not), the checkpoint is invalidated and fresh_apply is called instead. This handles the case where a checkpoint was created pre-reboot and the system has since rebooted. The pre-phase in fresh_apply opens LUKS and re-probes, generating a correct plan.
This avoids the complexity of reconciling a stale checkpoint against post-reboot state. The ActionState state machine doesn’t allow Pending → Completed, so marking pre-reboot work as completed would require weakening type safety.
BtrfsDeviceScanAll
A new CmdRequest::BtrfsDeviceScanAll variant runs btrfs device scan with no arguments, scanning all block devices. This replaces per-device scans in the pre-phase and in the execute_btrfs_add bootstrap-with-existing-btrfs path.
Pre-Phase Side-Effect Policy
After the pre-phase, LUKS is open and the pool is mounted even if the planner subsequently returns Blocked. This is a change from the old invariant:
- Old: Blocked = no mutations.
- New: Blocked = no planned mutations, but LUKS/mount happened as a pre-condition for accurate planning.
This is correct operationally — the pool was supposed to be online — but operators should be aware that braid apply with a blocked plan still opens LUKS and mounts the pool. The passphrase is consumed before the plan is generated.
Alternatives Considered
Reconcile stale checkpoint on resume
Walk the old checkpoint, detect which actions completed pre-reboot (by checking mapper/mount state), and mark them completed. Rejected because:
ActionState::transition_todoesn’t allowPending → Completed- Complex reconciliation logic with risk of incorrect state detection
- Simpler to invalidate and re-plan since pre-phase makes re-planning cheap
Keep LUKS opening in the execute phase
Could add btrfs device scan (no args) to the execute phase. Rejected because the planner would still see inaccurate state, generating misleading plans with mkfs.btrfs (may run) for returning pool members.
Dry-run LUKS open (check without opening)
Could probe LUKS UUIDs without opening to give the planner hints. Rejected as more complex than just opening — LUKS needs to be open anyway, so doing it early is simpler.
Active – Supersedes
008-unified-cli.mdand011-two-phase-apply.md.
Intent CLI
Context
Braid’s plan/apply reconciliation engine was over-engineered for NAS drives, which have ~4 events in their lifetime (create pool, add disk, add another, replace a dead one). The generic reconciler created problems:
- Risk flattening: routine reboot and adding a disk produced the same output format (a “plan” with “actions”)
- Combinatorial complexity:
--allow-remove-missing,--allow-remove-ambiguous,BRAID_CONFIRM='phrase1;phrase2' - Ceremony for routine operations:
braid applyafter every reboot
Decision
Replace plan/apply with five intent commands:
| Command | Purpose | Risk |
|---|---|---|
braid add <name=by_id>... | Format + join pool, or recover identity-verified LUKS device | Destructive (new disk), safe (returning braid disk with matching FSID), or refused (non-braid LUKS, foreign pool, no pool to verify) |
braid remove <name> | Migrate data off present disk, detach from pool | Long-running |
braid remove-missing --missing-id <devid> | Clean up a stale missing-device entry; restores RAID1 profiles if this clears the last missing device | Long-running |
braid replace --old <name> --new <name=by_id> | Replace a disk (live or dead) using btrfs replace start; restores RAID1 profiles for missing-path when clearing the last missing device | In-place swap (preserves devid) |
braid status | Display pool health and disk info | Read-only |
Disk keys
Disk membership is CLI-owned runtime state in /var/lib/braid/pool.json (see 017-runtime-disk-membership.md). pool.json is keyed by LUKS UUID; the disk name is stored as presentation metadata. Disks are added with name=by_id syntax:
braid add toshiba=/dev/disk/by-id/ata-Toshiba_MN07_XXXX \
ironwolf=/dev/disk/by-id/ata-Ironwolf_ST12_YYYY
Mapper names are braid-<name> (e.g., braid-toshiba) — human-friendly, debuggable in lsblk/systemd logs, deterministic. They are runtime handles, not persistent identity.
Safety model
The old architecture used a structural code boundary — luksFormat was literally unreachable from apply. The new architecture replaces this with:
- Explicit operator intent: user specifies a disk key and confirms
- Layered identity check for existing LUKS devices:
a. LUKS UUID is the persistent identity. LUKS label
braid-<key>is an adoption-safety gate for returning disks; non-braid LUKS is refused outright. b. Pool must be mounted — bootstrap refuses existing LUKS (no pool to verify against). c. Opened mapper’s btrfs FSID must match the current pool — foreign-pool disks are refused. d. Braid-labeled LUKS with no btrfs superblock is refused – this state is ambiguous (clean eviction, partial init, manual wipe, stale data) and cannot be distinguished without tombstones. e. A braid-labeled LUKS disk with a btrfs superblock whose FSID matches the mounted pool may be accepted as a returned-disk add target. The add journal records the LUKS UUID before mutation. If the stale btrfs signature would blockbtrfs device add, braid runs onlywipefs --all --types btrfson the verified mapper and usesbtrfs device add -f. f. Superblock guard is defense-in-depth on the FSID-matching path for existing-LUKS adds. The bootstrap path accepts only disks classified as fresh non-LUKS during add planning, and the LUKS open helpers verify that any pre-existingbraid-<key>mapper is backed by the requested by-id disk before pool creation proceeds.mkfs.btrfsitself is invoked without-f, so its own signature check is the final fail-closed guard. - Unified confirmation with device context: all mutating commands (
add,remove,remove-missing,replace) show a rich device-info block (model, size, serial via lsblk) and confirm withType 'yes' to continue:. Degraded-path warnings are informational text, not special confirmation phrases.--yesskips the prompt for scripting. - Disk name immutability: mutating commands validate names against recorded disk identity and reject name rename/reassignment. Operators must use explicit
replaceorremove+addworkflows instead of renaming. - Journal-protected mutations: mutating commands write
pending-op.jsonbefore the first irreversible step; it is cleared only after the full operation (including follow-up work like soft balance) succeeds. Existing-pool add, replace, and remove-missing journals are phased. TheirPoolMutationphases may reconcile whether the primary btrfs membership mutation committed; their post-maintenance phases may only validate committed membership, repairpool.json, and finish owed resize/balance work. On any error exit, the journal persists to enablebraid recover.
--dry-run performs side-effect-free, passphrase-free LUKS probes only – LUKS label reads, and the keyfile credential test used by braid enroll (cryptsetup open --test-passphrase --key-file, which evaluates a credential without activating the device). Checks that require a passphrase or an open mapper – e.g. full identity verification (FSID comparison) – are deferred to execution time when the mapper is closed.
The dry-run preview itself stays on stdout. Side-effect-free probes that nevertheless do bound long-running work – specifically the Argon2-bounded --test-passphrase evaluation in braid enroll --dry-run – emit canonical [wait]/[ok]/[skip] status rows to stderr per Principle 13. Announce long-running work. The previous “successful dry-run leaves stderr empty” contract is intentionally relaxed for this case: an Argon2 derivation runs whether or not the user can see it, and silent dry-runs that take seconds-to-minutes look like hangs. The structured preview output is unchanged.
Replace safety constraints
--oldaccepts both live (present in pool) and dead/missing disks.- Both paths use
btrfs replace start— the sole replacement primitive. Live disks replace in-place; missing disks are rebuilt from RAID redundancy by devid. --missing-idis only valid when--oldis dead/missing. Rejected with live--old. Validated againstPoolState::missing_devids(live btrfs state viaprobe::probe_pool).- The missing devid is auto-resolved from
--old’s persisted pool.json devid, cross-checked againstPoolState::missing_devids– independent of how many devices are missing. Because--old’s name already identifies the member, no missing-count gate is needed;--missing-idis an optional cross-check (it must equal the persisted devid, elseOldDevidMismatch) and is never required. - Mixed state (live
--old+ pool has missing devices) is rejected – operator must repair the missing device first withbraid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>.braid remove-missingis only for intentional cleanup (forgetting stale entries without rebuilding data). - No replacement path uses
btrfs device add. Missing-path replace may run a post-commit soft RAID1 balance only when it clears the last missing device.
ENOSPC pre-flight check
remove and remove-missing validate that surviving devices have enough
unallocated space to absorb the target device’s allocations before invoking
btrfs device remove. Without this, btrfs will either ENOSPC instantly or
crash the filesystem to read-only mid-relocation (reproduced in
tests/repro/).
The >=2-survivor remove path treats relocation-probe uncertainty as
warn-and-proceed – a miss falls through to btrfs’s clean instant-ENOSPC –
while remove-missing and the 2→1 remove path are fail-closed on any
uncertainty, because a miss there can crash the filesystem read-only with
pending-op.json already written.
remove-missing also refuses an untrusted missing-device allocation shape
before btrfs device remove. Its trust check validates shape, not per-type
completeness: the targeted missing devid must have exactly one usage stanza,
every positive target allocation row must be one of Data/Metadata/System RAID1,
and at least one positive supported row must be present. Missing supported row
types are treated as zero demand because a sparse 3+ device RAID1 member may
legitimately hold only a subset of Data, Metadata, and System chunks.
Single-survivor cases use a path-specific check:
remove(2→1): the RAID1-aware relocation check does not apply (there is only one remaining device, not two). Instead, a single- survivor capacity check derives demand frombtrfs filesystem dflogical usage –data + 2 * metadata + 2 * system, reflecting the post-balance single + DUP profile on one device – and compares it to the survivor’sdevice_size - device_slack. This check runs at plan time and is re-run as a pre-journal gate inexecute(abovejournal::write_journal), closing the plan/execute drift window – a survivor over-committed by writes during the confirmation + inhibitor wait is caught before the irreversible-fbalance and fails clean, with nopending-op.jsonstranded.remove-missingon a 2-device RAID1 pool with 1 missing (pool.total_devices == 2 && pool.devices.len() == 1 && pool.missing_count == 1): rejected at preflight.btrfs_rm_devicerunsbtrfs_check_raid_min_devices(num_devices - 1)and returnsBTRFS_ERROR_DEV_RAID1_MIN_NOT_METwhenever the remaining device count would drop below the RAID1 minimum of 2, so the call is guaranteed to fail at the kernel level. The supported repair paths for that case arebraid replace(preferred) orbraid addfollowed bybraid remove-missing.
NixOS-native automation
- systemd
braid-unlock.service+braid-pool.targetfor post-boot unlock braid-online.servicelifecycle owner (ExecStop=braid lock,RemainAfterExit=yes)
Rejected alternatives
- Keep plan/apply with simpler flags: Still risk-flattening. The core problem is that a generic reconciler treats “reboot recovery” and “add a new disk” as the same kind of operation.
- Separate init-disk + apply: The original approach. Created an artificial code boundary that was hard to explain and required ceremony for the common case.
Consequences
- Five commands instead of three (no init-disk, no plan, no apply;
removesplit intoremove+remove-missing) - Dry-run/confirmation coverage is a command category, not a blanket guarantee.
The pool/LUKS-lifecycle mutators (
add,remove,remove-missing,replace,unlock,lock,enroll,recover) support--dry-run, whilediscoverpreviews by default and commits with--write.--yesis scoped to the confirmation-gated mutations (add,remove,remove-missing,replace) for scripting. Reactive notification-state maintenance (ack) and internal systemd-invoked paths (scrub-*) are deliberately excluded – they are reversible/self-correcting or machine-contract commands where a dry-run preview adds no operator value. - Tab completion returns disk names from
pool.json
Decision: Group-based mount point permissions
Context
When braid mounts the pool at /mnt/storage, the btrfs root is root:root 0755. Regular users can’t write — blocking rsync, cp, and Samba workflows. NAS users need group-level access to the mount root without running everything as root.
Decision
The NixOS module declares a storage group and emits the Unix group name in runtime config as pool_access_group. Rust dispatch reconciles mount point permissions (root:<group> 2770) from mark_online after every braid command that results in a mounted pool.
Why group-based (not ACLs, not per-user subvolumes)
- Group + setgid is the simplest model that covers the NAS use case: a set of trusted users who all need read/write access to the same pool.
- ACLs add complexity and tooling requirements (getfacl/setfacl) without benefit for the typical home NAS.
- Per-user subvolumes solve a different problem (isolation), not shared access.
- This matches the pattern used by TrueNAS and OpenMediaVault.
Why module config drives it
Mount point permissions are OS-level access policy, so the NixOS module owns the group name and whether the fixup is configured. Rust dispatch executes the fixup because it already holds /run/braid-pool.lock through post-mount lifecycle work, which keeps permissions synchronized with the same mounted-pool state that drives braid-online.service.
Why Rust dispatch executes the fixup
The shell wrapper is a pure exec shim that injects tool packages onto PATH. mark_online (cli/src/online_state.rs) applies chown root:<group> + chmod 2770 on the mount point after successful mount-producing commands (unlock, add, recover).
Properties:
- Explicit – runs synchronously after the mounted-pool command succeeds, before control returns to the caller
- Covers all mount paths –
braid unlock(direct CLI, systemd service, auto-unlock),braid add(bootstrap), andbraid recover(recovery) all go through Rust dispatch - No async race – unlike a systemd ExecStartPost or path watch, the fixup completes before the caller sees success
- Idempotent – permissions persist in btrfs metadata; re-running is a no-op
- Failure-tolerant – warns to stderr if chown/chmod fails; never overrides the wrapped command’s exit code
Why storage as default group name
- Standard NAS convention (TrueNAS, OMV use similar names)
- No collision with existing NixOS system groups
- Configurable via
braid.poolAccessGroup; set tonullto disable entirely
Scope
This sets ownership and mode on the mount root directory only. It does NOT:
- Override per-file permissions (files created with restrictive umask remain restrictive)
- Provide a complete multi-user collaboration model
- Manage ACLs or sub-directory policies
The setgid bit (2770) ensures new files/directories in the mount root inherit the storage group, but the owning user’s umask still controls the group-write bit on individual files.
See
cli/src/online_state.rs–mark_onlinepermission fixupmodules/braid/options.nix–poolAccessGroupoption definition- Sane defaults – philosophy on opinionated defaults
First-Class Alerts for Disk Health
Context
Synology NAS boxes beep when a disk develops bad sectors — you hear it, SSH in, and deal with it. Without active alerting, a braid NAS user has no idea anything is wrong unless they happen to run braid status.
Decision
Alert as primary domain concept
braid has first-class Alerts. An Alert represents “something happened that needs human acknowledgment.” Beeping is one notification mechanism for an active alert. braid status is the primary surface for understanding alert details. braid ack acknowledges current alerts and silences notifications.
Shared alert computation
A single shared computation produces an AlertState consumed by all surfaces — braid monitor (exit code), braid status (banner + causes), TUI (banner + indicators). No surface re-encodes alert logic.
Alert causes
AlertCause is an explicit enum:
BtrfsDeviceErrors { devid }— non-zero btrfs device stat counters above acked baseline, excluding alert-local missing devidsMissingDevice { devid }— device missing from poolSmartdAlert— smartd SMART health warningComputationError { detail }— probe or parse failed before a structured cause could be determined
The status banner is cause-neutral (“disk health issue detected”); cause details appear below it and in JSON output.
Two detection sources, one alert model
braid owns btrfs device stats + missing device detection. smartd owns SMART monitoring and writes a flag file (/var/lib/braid/smartd-alert) when triggered. The shared computation checks btrfs stats, missing devices, and smartd.
All five btrfs device stat counters trigger alerts
write_io_errs, read_io_errs, flush_io_errs, corruption_errs, generation_errs. Any non-zero counter above the acked baseline triggers an alert for a present recognized devid. Devids in the alert-local missing set are excluded from BtrfsDeviceErrors and alert through MissingDevice instead.
Two kernel paths feed those counters: ordinary I/O and scrub. Scrub records
read, checksum, and generation failures by incrementing
BTRFS_DEV_STAT_READ_ERRS, BTRFS_DEV_STAT_CORRUPTION_ERRS, and
BTRFS_DEV_STAT_GENERATION_ERRS in
reference/linux/fs/btrfs/scrub.c:985-993. The monitor polls the same device
stats either way, so scrub-discovered uncorrectable errors reach the operator
through the same BtrfsDeviceErrors cause and beep as everyday I/O errors; a
separate scrub-status alert probe would be redundant with this pipeline.
Latched alerts
Alerts persist until braid ack — even if the triggering condition disappears. This means “something happened that needs acknowledgment,” not “something is currently true.” This avoids cross-source bugs where one source clearing could hide another source’s alert, and matches Synology UX.
Ack snapshots gating inputs before probing
cmd_ack reads the alert latch, the smartd flag (smartd-alert), and the ack cleanup-pending sentinel (alert-cleanup-pending) once at function entry, before probe_pool_alerts. Every decision in that ack – the gate that decides whether to proceed, the cleanup-only retry branch, and the cleanup that removes alert files – references that single snapshot. If the sentinel is the only live signal, cmd_ack runs a cleanup-only retry branch before probe_pool_alerts so recovery does not depend on probe success and does not rewrite the acknowledged baseline. The alert probe is devid-keyed and intentionally does not depend on LUKS UUID identity or pool FSID. The pool lock at /run/braid-pool.lock already serializes monitor vs ack vs add/remove writers, but the smartd hook is intentionally unlocked, so a per-ack snapshot is the only mechanism that gives ack a coherent view of smartd state.
The smartd flag is cleared during cleanup when either the snapshot observed the flag active or the snapshot’s latch carried a SmartdAlert cause. The first arm covers the normal “flag present, ack silences it” case. The second arm is an explicit exception for the crash-recovery case where a prior cycle latched SmartdAlert but the flag was already absent at snapshot, such as a partially-applied earlier ack, manual state, or filesystem-level divergence. The user’s ack is aimed at the latched smartd source, so a flag that the smartd hook writes during the probe is part of that source and is cleared.
A flag that exists at cleanup time when the snapshot saw neither active smartd state nor a latched SmartdAlert cause arrived after the snapshot and is left in place: the next monitor cycle is responsible for latching it cleanly.
Ack state keyed by btrfs devid
Acked baselines are keyed by btrfs devid (acked-stats.json maps stringified devid to baseline) – no path or LUKS UUID mapping is required to associate a stats row with its baseline. The parser captures missing device devids from MISSING sentinel lines.
Membership cross-reference is performed at the alert-pipeline boundary, not at the baseline-keying level. AlertPoolState::recognized_devids (in cli/src/probe.rs) returns the union of present_devids, null_underlying, and missing_devids for the current cycle. Both compute_alert_state and snapshot_current filter btrfs device stats rows against that set before emitting causes or writing baselines. A stats row whose devid is outside the recognized set is treated as transient/stale identity: it cannot latch BtrfsDeviceErrors, and braid ack does not persist a baseline for it, which prevents a loop on the next monitor cycle’s reconcile_acked_stats prune.
Within the recognized set, compute_alert_state also skips rows whose devid is in the alert-local missing set (missing_devids plus null_underlying). Those rows alert through MissingDevice, not BtrfsDeviceErrors, regardless of the device string btrfs printed. snapshot_current still records recognized rows by devid before layering missing_acked = true from the missing set, so a returning member does not re-alert on stale counters already acknowledged while missing.
Ack state separate from pool.json
Different concerns (identity vs acknowledgment), different write patterns, different risk profiles (precious vs disposable). Stored at /var/lib/braid/acked-stats.json.
Ack state is machine-local
On a new machine, acked state doesn’t exist — everything evaluates fresh.
braid monitor is a pure detector
Checks state and returns an exit code. Does not start/stop services. The systemd wrapper starts the beeper on exit 1.
Exit codes:
- 0 – ok, pool offline with no active alerts, or pool-lock-contended cycle (silently skipped; re-evaluated on the next timer tick)
- 1 – alert active (disk health issue OR indeterminate state latched as
ComputationError– e.g. probe failure, parse failure, unmapped device) - 2 – pre-
cmd_monitorsetup failure (e.g. pool-lock I/O, config load failure). Reserved for “could not even attempt to detect”; never emitted bycmd_monitoritself.
Fail closed: any failure inside cmd_monitor that leaves pool state indeterminate latches a ComputationError cause and reports exit 1, so the systemd wrapper starts the beeper. Exit 2 means the monitor never ran – a beep would be meaningless because there is no AlertState to report.
Alert-state mutators are serialized by /run/braid-pool.lock. Every command that writes acked-stats.json or alert-latch.json (monitor, ack, add, remove, remove-missing) acquires /run/braid-pool.lock in Rust dispatch (see ADR 026) before reading state or running probes. This is intentionally the same lock used by pool mutators: monitor and ack perform read-modify-write cycles around subprocess I/O, while add/remove/remove-missing prune acked baselines as membership changes. Sharing one lock keeps “baseline and latch clear” authoritative and prevents stale monitor snapshots from resurrecting acknowledged alerts.
Mount presence is read from /proc/self/mountinfo via mount_check::fstype_at_mount_via_fs, not from findmnt. A readable, well-formed mountinfo file with no entry for the configured mount point is legitimate PoolOffline and exits 0. Any mountinfo I/O failure, malformed line, or duplicate target entry is indeterminate state: it surfaces as ProbeError::MountInfo, latches ComputationError, exits 1, and starts the beeper.
Self-heals stale ack state (resets missing_acked for now-present devids after drive replacement).
Periodic one-shot, not daemon
systemd timer + oneshot service. No mount condition on the timer — braid monitor handles pool-not-mounted gracefully (exit 0).
On by default
braid.monitor.enable defaults to true when braid.enable is true. beep/pcspkr failures are silently swallowed.
Audible doctor beep is opt-in
Plain braid doctor reports the alert-beep check as skipped after confirming
beep monitoring is configured. braid doctor --beep runs the canonical
braid-beep-probe wrapper so operators can test the real alert sound on
purpose. braid doctor --json always skips the audible probe, and --json
conflicts with --beep at parse time, so machine-readable output has no
audible side effects.
Latch as append/refresh log
The alert latch is an append/refresh log of all unacked causes from all sources. Each monitor cycle loads the existing latch, computes new causes, and merges. Previously-latched causes that aren’t re-detected are carried forward. Newly-detected causes replace their latched counterpart (same key = fresher evidence). This ensures all cause types persist until braid ack, even if the triggering condition resolves — fixing the invariant for all sources, not just journal.
Corrupt latch recovery
load_alert_latch returns Result<Option<AlertState>, LatchLoadError> so callers can distinguish three outcomes: file absent (Ok(None), normal – no active alerts), I/O failure (Err(Read)), and unparseable on-disk content (Err(Parse)). Each caller picks its own fail-closed policy:
cmd_monitoris the only path that mutates the latch. On read/parse failure it quarantines the bad bytes by linkingalert-latch.jsontoalert-latch.json.corruptand then removing the live path, then writes a fresh latch containing a loudComputationErrorcause whosedetailnames the failure. Quarantine useshard_link+remove_file(notrename) so an already-existing sidecar is detected atomically bylink(2)’sEEXIST; when that happens, the first sidecar is preserved as the highest-value forensic snapshot and the new corruption is surfaced only in theComputationErrordetail. Any I/O failure during quarantine is folded into the same detail rather than silently dropped. The corruption signal is folded into a singleComputationError(not appended as a second cause), becausemerge_into_latchcollapses everyComputationErrorinto one slot viasame_cause_key– appending two would silently drop one.cmd_statusis the read-only surface:resolve_alert_statesurfaces a corrupt latch as aComputationErrorcause but never moves the file (status must not mutate state).cmd_acktreats a corrupt latch as an active alert for gating purposes — otherwise a genuinely unmounted ack would refuse withPoolNotMountedand the user would have no way to clear a corrupt file with the pool offline. Mounted ack and genuinely unmounted ack clean up bothalert-latch.jsonand the.corruptsidecar. A foreign fstype at the configured mount point is a probe error, not offline ack, and preserves the unreadable latch bytes.
This preserves “latched until ack” even when the on-disk state is unreadable: the operator sees a loud ComputationError, the bad bytes are preserved for forensics until an ack path that can safely clean them up, and ack succeeds for mounted or genuinely unmounted pools.
Cleanup ordering and retry-on-failure
Ack cleanup preserves three invariants. First, the beeper stop hook is attempted before any fallible cleanup operation; the hook is best-effort, so the invariant is that the stop attempt runs, not that sound was proven stopped. Second, destructive removals run in smartd-alert -> alert-latch.json -> alert-latch.json.corrupt order, so the corrupt sidecar leaves last and the forensic guarantee above is preserved across cleanup failures. Third, ack writes alert-cleanup-pending after the stop hook and before the first destructive step, then clears it only after the last destructive step succeeds.
CleanupFailed recovery has two cases. If creating alert-cleanup-pending itself fails, no destructive removal has run, so the original entry signals are byte-identical and the retry is driven by the normal ack path. If marker creation succeeded and a later step failed, the sentinel remains on disk. cmd_ack consults that sentinel before probing; when it is the only live signal, the hoisted cleanup-only branch reruns cleanup without probe_pool_alerts, without runner requests, and without rewriting acked-stats.json. Either path makes re-running braid ack after fixing the I/O fault genuinely idempotent.
Offline ack policy
braid ack works with the pool locked, but only when the pool is genuinely unmounted: /proc/self/mountinfo has no entry for the configured mount point. If the configured mount point is occupied by a non-btrfs filesystem, cmd_ack returns ProbeError::NotBtrfs; it must not clear alert-latch.json, remove smartd-alert, create or rewrite acked-stats.json, stop the beeper, or quarantine corrupt latch bytes.
For genuine offline ack, the persistence layer has an asymmetry by cause type:
MissingDevice { devid }– offline ack reads the latch and appliesmissing_acked = trueto that devid inacked-stats.json(insert-or-update; existingdevice_statsbaselines are preserved). The next mounted monitor cycle suppresses the cause, andreconcile_acked_statsself-healsmissing_ackedback tofalseif the device returns.BtrfsDeviceErrors { devid }– offline ack refuses with an actionable error (“cannot ack btrfs device errors while pool is offline – unlock the pool first”). The counter baseline that suppresses re-firing is the current output ofbtrfs device stats, which requires a mounted pool. Refusing the whole ack (not partial-acking other causes) avoids leaving the operator in an “I acked but it still says ALERT” state.SmartdAlert– offline ack removes the smartd flag file (the authoritative trigger source); noacked-stats.jsonwrite is needed.ComputationError– offline ack removes the latch; the cause re-fires on the next monitor cycle only if the underlying computation still fails.
Coupled to the asymmetry: offline ack only loads acked-stats.json when at least one MissingDevice cause is latched, so an unrelated corrupt acked-stats.json cannot block an offline ack of a pure SmartdAlert or ComputationError latch. When acked-stats.json is loaded (a MissingDevice cause is being applied), the fail-closed load_acked_stats_fallible is used so corrupt files are propagated as I/O errors rather than silently overwritten – matching the policy in drop_ghost_acked_for_devids.
Acked-stats hygiene across pool membership changes
btrfs allocates new devids as last_devid + 1 (kernel: fs/btrfs/volumes.c, find_next_devid), so a remove-then-add sequence reuses the removed devid only when that devid was the current maximum at remove time. Removing a non-max devid leaves a permanent gap. A stale acked-stats entry for a reused devid would otherwise carry the previous holder’s device_stats baseline (suppressing health alerts until counters exceed the ghost) or its missing_acked = true flag (suppressing missing-device alerts) onto the fresh disk.
Invariant: a reused devid must never inherit the previous holder’s ack baseline.
Three layers enforce it:
- Add-time guard (correctness boundary):
cmd_addclears acked-stats unconditionally on bootstrap and drops the assigned devid per-disk inside the live-pool add loop.cmd_recover, when finishing an interrupted add, mirrors both: bootstrap-recovery callsremove_acked_stats, and live-add recovery drops every journaled target’s devid (per-arm after a replayedpool_add_device, and via a final sweep when the target was already live at recovery entry – the committed-but-closed crash window). Cleanup failure here is command-fatal in bothcmd_addandcmd_recover: the error names the stage and instructs the user to delete the file before relying on alerts. - Remove-time prune (hygiene):
cmd_removeandcmd_remove_missingdrop the affected devid on success.cmd_recovermirrors the prune for committed removes only – the Remove guard may restore a target whose eviction did not complete, in which case its acked-stats entry is a legitimate baseline that must survive. Cleanup failure here is non-fatal (warning) – the nextaddfor that devid will catch it via layer 1. Thecmd_removeplanner enriches the journaledpre_membershipwith the target’s live btrfs devid so recovery can resolve it after a discover-timepool.json. - Monitor reconcile (defense-in-depth):
cmd_monitorprunes orphan entries (devid no longer inpool.present_devids,pool.null_underlying, orpool.missing_devids) every cycle. This catches crash recovery and manual btrfs operations performed outside braid. It cannot detect ghost data once a devid is reused, so the add-time layer is the boundary for that case. The read itself usesload_acked_stats_fallibleso a corrupt or unreadableacked-stats.jsonlatchesComputationErrorinstead of silently re-firing acked causes against an empty baseline, matching offline ack anddrop_ghost_acked_for_devids. A save failure during reconcile latches the sameComputationErrorso a persistent FS write fault (EROFS, ENOSPC, or EACCES onacked-stats.jsonor its parent) surfaces via exit-1 beep rather than accumulating only in journald.
Backstop: independently of those three layers, the alert computation fails loud when the acked baseline is no longer comparable to the current counter stream. compute_alert_state treats an acked counter that exceeds the current btrfs device stats value as 0 and alerts on any nonzero current. btrfs device-stats counters are persistent and monotonic (reset only by -z, which braid never runs), so the only ways a current value can sit below the ack baseline are a reused devid that inherited a ghost baseline before add/recover cleanup dropped its acked entry (the committed-but-closed crash window above), or an operator resetting the live counters with btrfs device stats -z. The three layers aim to remove a stale baseline; this guard ensures that if one transiently survives, it cannot suppress a later nonzero counter.
Rejected alternatives
- Daemon-based monitoring: more complex lifecycle management for no benefit over a timer + oneshot
- Storing alerts in a database: unnecessary complexity; file-based flag + JSON is sufficient
- Per-surface alert logic: each surface re-checking btrfs stats independently would lead to inconsistencies
- Counter-based thresholds (e.g., alert after N errors): any non-zero counter above baseline is worth investigating; thresholds delay detection
- Kernel journal scanning: originally implemented as a supplementary alert source scanning
journalctl -kfor “BTRFS error” messages. Removed because btrfs commits every 30 seconds, which increments device stats counters for any disk error within that window. The 5-minute monitor poll catches those counters reliably. Journal scanning was redundant with device stats and added significant complexity (cursor tracking, regex parsing, crash-safe cursor ordering, latch merge logic). Repro VMs intests/repro/kernel-journal-*preserve the empirical evidence from the original investigation.
Decision: HDD defaults
Principle: HDD defaults
Context
braid manages a NAS pool of LUKS-encrypted btrfs RAID1 drives. The typical deployment is bulk storage on large-capacity spinning drives (e.g., 12–16 TB HDDs). Several defaults already assume rotational media:
cryptsetup openomits--allow-discards, so TRIM/discard requests from btrfs never reach the underlying device. btrfs also exposes a mount-layer discard knob (discard=async, the kernel default since 6.2 on devices that advertise discard support), but braid’s LUKS layer gates it: without--allow-discards, the mapped device never reports discard support upward, so the kernel default never activates and any explicitdiscard=asyncwould be silently dropped.noatimeavoids relatime’s read-triggered metadata write-amplification on every RAID1 copy.- Monthly scrub interval is tuned for spinning disk wear and noise.
Making braid flash-aware would mean adding --allow-discards (with its security tradeoff of leaking block-usage patterns through the encryption layer), flash-specific scrub/balance scheduling, and flash-targeted test coverage. None of this is warranted for the target use case.
Note: braid already handles flash media in its monitoring paths — NVMe SMART parsing (cli/src/parse/smartctl.rs) and transport-type detection (cli/src/tui/probe.rs) work with any drive type. This decision is about operational defaults, not monitoring.
Decision
Defaults are chosen for HDD NAS deployments. Flash media (SSDs, NVMe, USB sticks) may function but are not a validated or optimized target.
Tradeoffs accepted
- No TRIM passthrough — braid pins discard off at the LUKS layer by omitting
--allow-discardsand, by consequence, at the btrfs mount layer because no effectivediscard=asynccan pass through regardless of kernel default. SSDs used with braid experience increased write amplification and performance degradation over time. - No flash-specific testing — flash-related issues in LUKS or mount configuration may go unnoticed.
Alternatives considered
Default-on btrfs compression (compress=zstd:1)
Rejected. braid targets HDD bulk-storage NAS pools where the dominant content is media and archives: video, audio, photos, and other formats already compressed at the application layer. Transparent filesystem compression usually saves little or no space on that mix, while the btrfs heuristic that skips incompressible extents still spends CPU on each write. On low-power NAS hardware, that cost is not free.
Reversal is also partial. Removing compress=... affects future writes only; extents already written compressed stay compressed until the data is rewritten or explicitly defragmented. Making compression the default would bake that conversion cost into pre-v1.0 software for users who later discover their workload does not benefit. For that reason, cli/src/cmd.rs base_mount_options() intentionally omits compression.
Fedora’s compress=zstd:1 precedent is workstation root filesystems on SSDs: binaries, logs, configs, and package payloads. That precedent does not transfer cleanly to HDD bulk-storage NAS workloads. Users with compression-friendly data, such as text, code, or document servers, can opt specific paths into compression today with btrfs property set <path> compression zstd; reference/btrfs-progs/Documentation/btrfs-property.rst documents this modern per-inode interface. This is preferable to legacy chattr +c, which uses ext2-style flags and defaults to zlib. No braid feature gate is needed for this per-path opt-in.
See
cli/src/cmd.rs—CryptsetupLuksOpenandCryptsetupLuksOpenKeyFileomit--allow-discardscli/src/cmd.rs—base_mount_options()omits anydiscardoption, relying on the kernel default that is itself gated by the LUKS layercli/src/cmd.rs#base_mount_options– sets noatime to avoid relatime’s read-triggered metadata write-amplification on RAID1.- Sane defaults — scrub interval tuned for spinning disks
- ADR 031: Drive-wake posture – mounted drives are
treated as awake;
noatimeis not spindown management.
Auto-Suspend via autosuspend + braid idle
Context
HDDs in a btrfs RAID1 NAS can’t rely on per-drive spindown — btrfs periodic commits (every 30s), smartd polling, and braid-monitor health checks wake drives frequently. The user wants the NAS to be quiet and low-power when not in use, and responsive when needed.
Scope note: this decision governs whole-system suspend-to-RAM (S3). Its
per-drive spindown context explains why braid chose system suspend for mounted
NAS idle behavior; it does not preclude a future opt-in per-drive
braid.autoSpinDown that parks drives only while the pool is locked. See
ADR 031: Drive-wake posture.
Decision
Whole-system suspend-to-RAM
The entire NixOS machine suspends when idle. This preserves LUKS keys and the mounted btrfs pool in RAM — no re-unlock ceremony on wake. Drives stop, CPU stops, fans stop. Wake via Wake-on-LAN or RTC alarm.
autosuspend as the daemon
autosuspend is an existing Python daemon in nixpkgs that handles idle countdown, periodic activity checks, and RTC wakeup scheduling. When the host is idle, it executes the configured suspend command (typically systemctl suspend). systemd/logind then applies the actual sleep request semantics, including honoring active high-level sleep inhibitor locks. Writing a custom daemon for this would reimplement what autosuspend already does well.
braid configures autosuspend via the existing NixOS module (services.autosuspend). The user writes braid.autoSuspend.enable = true; and gets sensible defaults.
braid idle as the btrfs check
A separate CLI command (braid idle) checks for an in-flight scrub plus any kernel exclusive operation (balance, balance paused, device add, device remove, device replace, resize, swap activate). The exclusive-operation states are read from /sys/fs/btrfs/<fsid>/exclusive_operation – the same source preflight.rs uses for mutating commands – so the two code paths cannot disagree about what counts as busy. Scrub is read separately via btrfs scrub status because scrub is not in the kernel’s exclusive-operation set (see reference/btrfs-progs/common/utils.c:1188-1197). autosuspend calls braid idle via ExternalCommand check.
Why a separate command rather than inline shell in autosuspend config:
- braid already has the parser for
btrfs scrub statusand the sysfs read helper - Fail-closed behavior (probe failures map to
Busy(Unknown)-> exit 1 -> block suspend; setup/config errors stay at exit 2 and also block via!) is easier to get right in Rust than in shell - Testable with unit tests via MockRunner + a
Filesystemmock
braid wol-ready as the Wake-on-LAN check
A hidden CLI command (braid wol-ready) checks the configured braid.autoSuspend.wolInterface immediately before autosuspend is allowed to suspend the host. It runs ethtool <iface> through braid’s command runner and reuses the same WoL classifier as braid doctor, so the on-demand diagnostic and the per-suspend gate cannot drift on what counts as magic-packet armed.
Invariant: braid.autoSuspend will not automatically suspend the NAS unless braid.autoSuspend.wolInterface currently reports Wake-on: g.
The command is intentionally scoped to braid’s autosuspend path. Manual systemctl suspend remains available for admin maintenance, local testing, and machines where the operator deliberately accepts the wake risk. A universal sleep.target gate was considered and deferred because it would turn braid’s claim from “braid will not auto-suspend unsafely” into “this machine may not suspend at all,” which is a broader and more surprising ownership boundary.
Exit code inversion
braid idle and braid wol-ready follow natural Unix convention (exit 0 = success). autosuspend’s ExternalCommand convention is inverted (exit 0 = activity detected). The NixOS module bridges this with bash -c '! <command>':
| braid command | braid exit | Meaning | After ! | autosuspend result |
|---|---|---|---|---|
braid idle | 0 | idle | 1 | allow suspend |
braid idle | 1 | busy or probe failure | 0 | block suspend (fail-closed) |
braid idle | 2 | setup error | 0 | block suspend (fail-closed) |
braid wol-ready | 0 | Wake-on: g armed | 1 | allow suspend |
braid wol-ready | 1 | not armed or unverifiable | 0 | block suspend (fail-closed) |
braid wol-ready | 2 | setup error | 0 | block suspend (fail-closed) |
| either command | timeout | signal-killable overrun >10s | 0 | block suspend (fail-closed) |
timeout must be inside bash -c so its non-zero overrun result is inverted by !. An outer timeout (timeout -k 2 10 bash -c '! braid idle') would fail open: bash gets killed before ! runs, autosuspend sees the non-zero timeout result and treats it as no activity. Coreutils’ timeout sends TERM at the main deadline and -k 2 escalates to KILL two seconds later for processes that ignore or delay TERM (see reference/coreutils/src/timeout.c).
Scope of the timeout invariant: this covers signal-killable command overruns (parser regression, slow userspace probe, network-FS latency). Uninterruptible kernel waits (process in D state on a wedged ioctl) are not bounded by timeout(1) and remain a separate failure mode; under that condition the autosuspend tick itself stalls until the syscall returns, so the system stays awake by virtue of not deciding.
Mount probe reads /proc/self/mountinfo directly
braid idle’s initial mount-presence check (is_btrfs_mounted) reads /proc/self/mountinfo via the existing Filesystem abstraction rather than shelling out to findmnt. Rationale: the mount probe is a fail-closed safety gate; any subprocess fallback path that maps “non-zero exit + empty stderr” to “no mount” reintroduces the fail-open seam this gate exists to prevent. The kernel-maintained mountinfo file gives a direct answer in one syscall, with no fork/exec.
Octal-escaped mount-point fields (\040, \011, \012, \134) are decoded before comparison so configured mount paths containing whitespace match correctly.
IO errors (file unreadable, EIO), malformed mountinfo lines, and ambiguous duplicate target entries surface as Busy(BusyReason::Unknown), exit 1, and block suspend. “Don’t know” never becomes “allow suspend”.
Exclusive-op probe scans /sys/fs/btrfs/* directly
After the mount check passes, cmd_idle reads exclusive_operation from every entry under /sys/fs/btrfs/ via preflight::check_any_btrfs_exclusive_op and returns busy as soon as any one is non-none. No findmnt or btrfs filesystem show subprocesses are invoked on this path; only the scrub probe (btrfs scrub status) remains, because scrub is not part of the kernel’s exclop_def[] set (reference/btrfs-progs/common/utils.c:1186-1194).
Semantics: any in-flight exclusive op on any btrfs filesystem on the host counts as busy. On a typical braid host (one btrfs filesystem, the pool) this is identical to a fsid-scoped check. On a host with btrfs root alongside the pool the reported BusyReason may name an op on the non-pool fs, but the suspend decision is still correct – autosuspend’s job is to err conservative, and “do not suspend while any btrfs is mid-balance/replace/etc.” is the right answer regardless of which fs is busy.
Pseudo-dir skip is by name allowlist (features, debug), not by “absorb any NotFound on read.” The kernel only creates exclusive_operation under per-fsid <uuid>/ dirs (reference/linux/fs/btrfs/sysfs.c:29-47), but treating a missing attribute on any other listed entry as “must have been a pseudo-dir” would silently swallow a real failure mode: a fsid dir whose attribute disappears mid-scan during a concurrent unmount race. Under the allowlist, that race surfaces as ExclusiveOpError::Read and blocks suspend.
Fail-closed branches: list_dir("/sys/fs/btrfs") IO errors, any read error on a non-allowlisted entry’s exclusive_operation (including NotFound), unrecognized parser values, and an empty /sys/fs/btrfs/ after the mount check passed all surface as Busy(BusyReason::Unknown) and exit 1.
The scrub probe is held to the same contract: a parse_btrfs_scrub_status result of ScrubState::Unknown (empty stdout or an unrecognized Status: word) surfaces as Busy(BusyReason::Unknown) and exits 1. Parser drift must not silently allow suspend.
probe::probe_fsid is no longer reached from cmd_idle. It remains in use by non-idle callers (lock.rs and the preflight pipelines that need a UUID for other purposes), and is out of scope for this gate.
Scrub probe is scoped to the pool mount point
Unlike the exclusive-op scan, the scrub probe is not host-wide: cmd_idle
runs btrfs scrub status against only the configured pool mount point. A
scrub on a non-pool btrfs (e.g. the btrfs root) is therefore not detected and
does not block suspend.
This asymmetry is intentional. braid’s autosuspend gate protects the braid
pool, not every btrfs on the host – the same ownership boundary that scopes
braid wol-ready to braid’s suspend path rather than installing a universal
sleep.target gate. The exclusive-op scan is broader only because one pass
over /sys/fs/btrfs/* reads every filesystem’s state for free and errs
conservative; matching that breadth for scrub would mean spawning a btrfs scrub status subprocess per filesystem on every autosuspend tick, for
coverage braid does not own.
SSH always on, SMB/NFS auto-detected
SSH check is unconditional — braid requires SSH for unlock, and an active SSH session means someone is working. SMB and NFS checks are auto-detected from config.services.samba.enable and config.services.nfs.server.enable to avoid false positives on systems that don’t run those services.
smartd and braid-monitor run opportunistically
Neither smartd nor braid-monitor should wake the system or prevent suspend. They run naturally during wake windows (user access, scrub wakeup). SMART counters accumulate in drive firmware regardless of polling. The only scheduled wakeup is for the monthly btrfs scrub timer.
Paused balance = busy
A paused balance holds the btrfs exclusive-operation lock. The mutating-command preflight in preflight.rs already treats a paused balance as a hard refusal (it can block indefinitely, so braid cannot enqueue behind it). Same logic in braid idle – don’t suspend mid-pause.
WoL managed by braid
braid.autoSuspend.wolInterface is required when sleep is enabled. braid sets networking.interfaces.<iface>.wakeOnLan.enable = true on the specified interface. A build-time assertion prevents enabling sleep without WoL – otherwise the NAS suspends and becomes unreachable until someone physically presses the power button. braid doctor verifies the live NIC reports magic-packet wake (Wake-on: g) for that interface on demand, and autosuspend also runs the hidden braid wol-ready check every suspend cycle. The BIOS-side WoL setting is the user’s responsibility (can’t be automated from NixOS).
Some drivers can reset WoL after resume. braid does not currently re-arm WoL from a system-sleep hook; instead, the autosuspend gate keeps the machine awake after the first wake if Wake-on: g disappears. That is the safe degraded direction: visible and diagnosable via braid doctor, rather than silently sleeping into an unreachable state.
Fully qualified store paths
The ExternalCommand command strings use absolute /nix/store/ paths for timeout, bash, and braid. autosuspend runs the commands outside braid’s wrapper, so PATH is not guaranteed to include these tools.
See
Active – Supersedes 002-config-first-workflow.md. Refined by 024-luks-uuid-identity.md.
Decision: Runtime Disk Membership
Principle: CLI-owned membership
Context
The original design declared disk membership in braid.disks (NixOS config). Adding a drive required editing Nix config, running nixos-rebuild switch, then running braid add <name>. This was wrong: disk membership is operational state (“which drives are in my pool right now”), not system architecture (“what services should run on this machine”). Requiring a rebuild to add a drive added ceremony and created a category error — NixOS config is for declarative system shape, not mutable runtime state.
Decision
Move disk membership to a CLI-owned runtime state file. The NixOS module provides infrastructure (mount point, services, toolchain). The CLI owns which disks are in the pool.
State model
/var/lib/braid/pool.json — CLI-owned membership keyed by LUKS UUID:
{
"disks": {
"11111111-1111-1111-1111-111111111111": {
"name": "toshiba",
"by_id": "/dev/disk/by-id/ata-TOSHIBA_...",
"devid": 1,
"added_at": "2026-03-27T12:00:00Z"
}
}
}
The map key is the member’s persistent identity. The name field is the operator-facing disk name used in commands, mapper names, and labels; it is not the identity. by_id is the hardware address used to find the disk before it is opened. devid is live btrfs state captured after membership commits and is only a fallback binding key when btrfs reports a missing or null-underlying device by devid alone. added_at is historical state – once set on a member, it is preserved across all subsequent writes (unlock, recover, replace, add, etc.). These fields replace the former disk-map.json advisory file.
/etc/braid/config.json — machine config (no disk information):
{ "mount_point": "/mnt/storage" }
Standalone CLI installs may keep this minimal shape. Module-generated configs
also include pool_access_group and systemd_lifecycle:
{
"mount_point": "/mnt/storage",
"pool_access_group": "storage",
"systemd_lifecycle": true
}
/var/lib/braid/pending-op.json — pending-operation journal (transient, present only during mutations).
Mutation ordering
All mutating commands validate, write pending-op.json with pre/target membership snapshots, perform the irreversible btrfs membership change, write pool.json to reflect the committed live membership, then advance the journal to a post-maintenance phase before performing any required post-mutation maintenance and clearing the journal.
pool.json reflects committed btrfs membership, not necessarily completion of follow-up maintenance such as RAID1 rebalance or resize. While pending-op.json exists, braid recover is responsible for replaying or completing any owed post-mutation work before clearing the journal when the balance state is safe to interpret. If owed RAID1 replay finds a paused, running, or unknown btrfs balance, recover fails closed and preserves the journal for manual inspection. Recovery in a post-maintenance phase must not rerun the primary btrfs membership command (device add, device remove, or replace start).
For add, membership commits when btrfs device add returns success; the post-add RAID1 balance is follow-up maintenance. For remove, membership commits when btrfs device remove returns success; writing pool.json before that would be wrong because btrfs still owns the device. For remove-missing, membership commits when btrfs device remove <devid> against the missing devid returns success; the post-remove soft balance that restores RAID1 redundancy for chunks created during degraded operation is follow-up maintenance. For replace, membership commits when btrfs replace start -B completes; the post-replace resize, and (for missing-path replacements that clear the last missing device) the soft balance, are follow-up maintenance.
The journal provides crash safety: if braid crashes mid-operation, the journal triggers recovery mode on next invocation. If a crash lands after pool.json was written but before the post-maintenance phase rewrite, braid recover detects the committed live topology, rewrites the journal to the post phase, and then finishes only the owed maintenance unless owed RAID1 replay finds a paused, running, or unknown balance state.
Recovery mode
When pending-op.json exists, braid enters recovery mode. Membership, mount, and key-enrollment commands (add, remove, remove-missing, replace, unlock, enroll, discover --write) hard-fail; read-only diagnostic and cleanup surfaces (status, doctor, lock, bare discover) stay available. braid recover is the only command that clears the journal: it opens LUKS devices, mounts the pool (with --allow-degraded if needed), and rebuilds or repairs membership from the live btrfs pool topology – not from LUKS label scanning, which could include labeled-but-never-added disks.
State contract
pool.jsonis authoritative.unlockrequires it.unlockenrichespool.jsonmetadata (devid,added_at, and current by-id observations where appropriate) after mount via live btrfs state, but never changes membership (disk set).- If
pool.jsonis missing or corrupt,unlockand the mutating membership commands fail with a clear error directing the user tobraid addorbraid discover --write. braid lock– the user-facing command, thebraid-online.serviceExecStop reentry, andbraid lock --dry-run– tolerates a missing or corruptpool.json: it warns and proceeds with empty membership. The per-candidatecryptsetup luksUUIDprobe inbuild_close_sets_*(cli/src/lock.rs) is the fail-closed guard, so cleanup remains complete and correct. No lock pathway hard-fails on an unloadablepool.json; dry-run folds the warning into its stdout preview while the real paths emit it to stderr (see ADR 026).- If
pool.jsonis readable but stale (a member fails to probe),unlockwarns and proceeds with the members it can probe. It never rewritespool.json. - If a member’s UUID key doesn’t match the probed device’s LUKS UUID,
unlockfatally errors. This catches swapped, reformatted, or corrupted drives before any LUKS open or mount is attempted. - Only these commands write
pool.jsonmembership:add,remove,replace,remove-missing,discover --write,recover.
Recovery
Recovery is always explicit, never implicit:
braid recoveropens LUKS devices and mounts the pool if needed. Mount membership is phase-specific: existing-pool add and remove-missing pool-mutation phases mount from the pre-operation membership, add/remove-missing post phases and replace post-maintenance recovery mount from the committed target membership, and replace pool-mutation, bootstrap-add pool-mutation (empty pre-operation snapshot), and plainremoverecovery mount from the admission membership (pre-operation snapshot plus target-only members, which for replace covers an in-flightdev_replace). This is the only path out of recovery mode (journal present). It probes actual pool topology, not LUKS labels. Each live member’sby_idis resolved at recovery time by walking/dev/disk/by-id/and matching the symlink whose canonical target equals the live device’s backing kernel path –by_idis never copied from the journal snapshot, which can be stale if hardware enumeration changed since the mutation started. If no by-id symlink resolves to a live pool member, recovery hard-fails with an actionable remediation message rather than persisting a guess. When rebuildingpool.json, recover preserves each member’sadded_atfrom the currentpool.jsonif present, else from the journal’s pre/target membership snapshot; only members with no prior timestamp get a freshnow_iso()stamp.by_id, the UUID key, anddevidremain live-derived or journal-verified according to the recovery phase. When the pool is already mounted by an external process (circumventingbraid unlock’s pending-op preflight) and the journal recordsReplace::PoolMutation, recovery refuses and directs the operator tobraid lock; braid recoverso a fresh mount session can be opened and the relock cycle can clear any kernel-resumed-dev_replacestaleness. Replace post-maintenance recovery is allowed on an already-mounted pool because the primary replace has already committed.braid discoverscans/dev/disk/by-id/*for LUKS devices withbraid-*labels. Displays what it finds. With--write, persists topool.json. This is for initial setup recovery (lost pool.json), not for crash recovery.- The normal path to create
pool.jsonisbraid add.
CLI syntax
braid add takes name=by_id positional pairs:
braid add toshiba=/dev/disk/by-id/ata-TOSHIBA wd=/dev/disk/by-id/ata-WDC
braid replace --new takes the same format:
braid replace --old toshiba --new seagate=/dev/disk/by-id/ata-Seagate_NEW
Lifecycle model
The NixOS module no longer generates data-pool fileSystems, LUKS entries, or btrfs-device-scan. Instead:
braid-online.service— lifecycle owner (ExecStop=braid lock,RemainAfterExit=yes). Started by Rust dispatch viamark_onlineafter a successfulunlock,add, orrecoverthat leaves the pool mounted, gated onsystemd_lifecycle = truein runtime config.braid-pool.target— wants unlock only, does not startbraid-onlinedirectly.- Consumer services bind to
mnt-storage.mount(auto-generated by systemd from/proc/mounts).
Rejected alternatives
- Keep
braid.disksbut make it optional — half-measure that leaves two sources of truth. Users would be confused about which one matters. - Auto-discover on unlock — makes
unlocka mutation command. If discovery finds the wrong devices (e.g., a test disk with abraid-*label), the pool is corrupted silently. Explicit membership is safer. - Store membership in btrfs metadata — btrfs doesn’t have a user-data field on devices. Would require a convention (e.g., subvolume with a JSON file), adding fragility and a chicken-and-egg problem for
unlock.
Consequences
- Adding a drive is one command:
braid add name=/dev/disk/by-id/.... Nonixos-rebuild. pool.jsonmust exist beforeunlockcan run. First-time setup:braid addcreates it.braid discover --writeis the explicit recovery path for lost/corruptpool.json.- The NixOS module’s
braid.disksoption is removed entirely.
See
cli/src/membership.rs– load/save/validate membership,DiskMember,PoolMembership,enrich_from_pool_state,foreign_luks_uuids(pure helper consumed bybraid doctor’sforeign_luks_uuidcheck)cli/src/journal.rs— pending-operation journal (pre/target membership snapshots)cli/src/recover.rs— rebuild membership from live pool statecli/src/preflight.rs—check_no_pending_operationrecovery mode guardcli/src/discover.rs— LUKS label scanningmodules/braid/storage.nix—braid-online.service, no data-poolfileSystemsmodules/braid/options.nix— nobraid.disks
Decision: Systemd Lifecycle State Machine
Principle: Resilient by default
Context
braid needs systemd integration for three things: interactive unlock, unattended unlock, and clean shutdown (LUKS close before power-off). The module must not generate data-pool fileSystems or boot.initrd.luks.devices entries — those create hard boot dependencies on the data pool (see 003-resilient-boot.md). Instead, the CLI owns LUKS open/close and btrfs mount/unmount at runtime, and a thin systemd layer provides the entry points and shutdown hook.
Units
┌─────────────────────┐
│ braid-pool.target │ entry point
│ wants + after │
└─────────┬────────────┘
│ (soft dep)
┌─────────▼────────────┐
│ braid-unlock.service │ interactive passphrase
│ oneshot │
└─────────┬────────────┘
│ (CLI marks online on success)
┌─────────▼────────────┐
│ braid-online.service │ lifecycle owner
│ ExecStart=/bin/true │
│ ExecStop=braid lock │ --systemd-stop
│ oneshot, RAE │
└──────────────────────┘
braid-auto-unlock.service (alternative unlock path, boot-time)
wantedBy multi-user.target activates braid-online via same CLI path
mnt-storage.mount (auto-generated by systemd from /proc/mounts)
braid-monitor.timer -> braid-monitor.service -> braid-alert.service
ConditionPathIsMountPoint (health polling, skipped when pool not mounted)
braid-scrub.timer -> braid-scrub.service
braid-online.service -> braid-scrub-resume-trigger.service -> braid-scrub.service
BindsTo + After braid-online.service (lifecycle-bound periodic scrub)
Persistent=true (catch-up on activation)
RAE = RemainAfterExit = true
braid-pool.target — entry point
Public handle for “bring pool online.” User runs systemctl start braid-pool.target.
wants(notrequires)braid-unlock.service— soft dependency. Unlock failure does not fail the target, and the target cannot block boot because nothingrequiresit.after braid-unlock.service— ordering only.- Does not want or require
braid-online.service. The CLI activates that separately after confirming the mount succeeded.
braid-unlock.service — interactive passphrase unlock
Single orchestrator: opens all LUKS devices and mounts the btrfs pool in one shot. Guarantees exactly one passphrase prompt (avoids relying on systemd-ask-password cache behavior across multiple LUKS units).
Type = oneshot— runs once, returns to inactive on completion.ConditionPathIsMountPoint(below) prevents re-run while mounted; the inactive state allowssystemctl start braid-pool.targetto re-unlock after a priorbraid lock.ConditionPathIsMountPoint = !${mountPoint}— skips if pool already mounted.- Calls
systemd-ask-password --timeout=0 --id=braid | braid unlock --passphrase-stdin.
braid-auto-unlock.service — unattended USB keyfile unlock
Optional (only created when braid.autoUnlock.enable = true). Runs at boot, unlocks from a USB keyfile without interactive prompt.
wantedBy = [ "multi-user.target" ]— starts automatically at boot.after = [ "local-fs.target" ]— waits for/runto exist.ConditionPathIsMountPoint = !${mountPoint}— skips if pool already mounted.- No
RemainAfterExit— intentional. If USB is absent at boot (service exits 0 on skip), a latersystemctl start braid-auto-unlockcan re-run when the USB is inserted. - Mounts USB read-only, validates keyfile path (symlink defense), runs
braid unlock --key-file, always unmounts USB after (never leaves keyfile accessible). - Always exits 0 — failures are logged to the journal but never reported as unit failure, because auto-unlock must not block boot under any circumstance.
braid-online.service — lifecycle owner
State-ownership service. Its only purpose is to mark “pool is online” and run the bounded braid lock stop path on stop.
ExecStart = /bin/true— no work. Exists for itsExecStophook.ExecStop = braid lock --systemd-stop --deadline-secs <n>– unmounts pool and closes all LUKS on shutdown or manual stop with a bounded stop-coordinator/pool-lock wait belowTimeoutStopSec. In this mode, braid permits a running or paused btrfsbalance: a running balance is explicitly paused before unmount, an already-paused balance proceeds to unmount, and every other exclusive operation is refused. If the blockingbtrfs balanceuserspace process briefly holds the mount fd after its parent dies, the systemd-stop path uses a longer transient-busy umount retry than plainbraid lock.RemainAfterExit = true— persists “active” state.ConditionPathIsMountPoint = ${mountPoint}– systemd skips activation when the pool is not mounted (systemctl startreturns 0 but the unit stays inactive). Defense-in-depth: the CLI’smountpoint -qcheck is the primary gate, but this condition prevents directsystemctl startfrom leaving the unit active while unmounted.TimeoutStopSec = 300s– raises the stop timeout from the 90s default so a slow braid lock is not SIGKILL’d mid-operation.- Not in any dependency chain. Neither the target nor unlock services want/require it. Activated exclusively by the CLI after
mountpoint -qconfirms the pool is mounted.
mnt-storage.mount — readiness contract
Auto-generated by systemd from /proc/mounts when the btrfs pool is mounted. Consumer services bind to this unit.
braid-monitor.timer + braid-monitor.service — health polling
Periodic oneshot (default: every 5 minutes). Pure detector — checks btrfs device stats for errors.
ConditionPathIsMountPoint— skipped cleanly when pool is not mounted (no dependency-failure noise from timer). NoAfterorBindsToonmnt-storage.mount— those directives force systemd to load the unit, which doesn’t exist before the first unlock.- Exit code 1 from
braid monitor→ startsbraid-alert.service. braid monitorfails closed: probe/parse/stats/mountinfo failures,acked-stats.jsonbaseline read/parse failures, and alert-latch read/quarantine failures latchAlertCause::ComputationErrorand exit 1, so the wrapper above starts the beeper. Exit 0 is reserved for healthy, pool-offline, and pool-lock-contended cycles; exit 2 is reserved for pre-cmd_monitorsetup failures (e.g. pool-lock I/O, config load failure) and is never emitted bycmd_monitoritself. See ADR 014 fail-closed contract for the cause taxonomy.- The gate and the fail-closed path are independent mount checks, so the gate cannot mask a real alert.
ConditionPathIsMountPointresolves throughstatx(STATX_ATTR_MOUNT_ROOT)(thenname_to_handle_at(2), then/proc/self/fdinfo) – a kernel VFS query, never a parse of/proc/self/mountinfotext. The fail-closed path above instead parses that text and latchesComputationErroron a malformed line, duplicate target, or read error. On a genuinely-mounted poolstatxreports a mount root regardless of any text anomaly, so the service runs and the beep fires – the protective beep is never gated away. The gate only short-circuits astatx-confirmed-offline pool; the sole beep it suppresses is braid’s conservativeComputationErroron an offline pool with anomalous mountinfo text, which is not a disk-health alert.
braid-scrub.timer + scrub service + resume trigger – lifecycle-bound scrub
Periodic scrub (default: monthly). Uses a timer-lifecycle pattern distinct from the monitor’s ConditionPathIsMountPoint-only approach.
- Timer is
wantedBy,BindsTo, andAfterbraid-online.service. Starts when pool comes online, stops when pool goes offline. Persistent=true+AccuracySec=1d. When the timer activates (pool unlock), systemd compares the last-trigger stamp againstOnCalendar. If a scrub was overdue during the offline period, it fires immediately.braid-scrub.serviceis the only foreground scrub runner. It isType=simple; its internalbraid scrub-resume-or-start --mount <mount>ExecStart resumes saved scrub progress first, then starts a fresh scrub only when btrfs reports nothing resumable.braid-scrub.serviceuses a sharedExecStopcancel script – same pattern as the nixpkgs btrfs scrub service. This cancels in-flight scrub on lock or shutdown throughbtrfs scrub cancel, leaving btrfs-progs’/var/lib/btrfs/scrub.status.<fsid>progress file available for the next resume.braid-scrub-resume-trigger.serviceis the pool-online predicate-and-poke path. It isType=oneshot,wantedBy,BindsTo, andAfterbraid-online.service; it runs internalbraid scrub-needs-resume --mount <mount>and startsbraid-scrub.servicewithsystemctl start --no-blockonly when saved progress is resumable.- The scrub service and resume trigger use
BindsTo+Afterbraid-online.service. On shutdown orsystemctl stop braid-online.service, systemd stops them beforebraid lockruns. ConditionPathIsMountPointon the scrub service and trigger is defense-in-depth.- Serialization via single runner. Only
braid-scrub.serviceever runsbtrfs scrub; both activation paths (timer and trigger) issuesystemctl start braid-scrub.service, and systemd coalesces overlapping starts for the same unit. A completedscrub-resume-or-startrun satisfies both an overdue timer fire and a pool-online resumable state, with noflockand no/run/braid-scrub.lock. Conflicts+Beforeshutdown.targetandsleep.targeton the scrub service. The short-lived resume trigger also usesConflicts+Beforesleep.targetso suspend setup wins cleanly against pool-online activation.
braid-alert.service — notification
Started by monitor on error detection. Beeps via PC speaker (if enabled) and/or runs a custom alert command. Stopped by braid ack.
Rust dispatch as synchronization layer
The wrapper (braid-wrapper.sh) is a pure exec shim: it sets the module-controlled PATH and execs the Rust binary. Synchronization lives in Rust dispatch (cli/src/main.rs), which owns the pool lock, braid-online.service lifecycle updates, and shutdown stop coordination. See 026-pool-lock-rust-owned.md.
modules/braid/cli.nix emits systemd_lifecycle = true for module-managed
installs. Standalone CLI deployments omit it; those configs still get mount
permission fixups but do not touch braid-online.service.
After every unlock, add, or recover attempt:
- Rust dispatch acquires
/run/braid-pool.lock, loads config and membership, and snapshotsbraid-online.serviceActiveStateonly whensystemd_lifecycle = true. - CLI opens LUKS + mounts pool when the command reaches its mount step. (
recoverself-mounts when recovering from an interrupted operation.) - Before dispatch returns, success or failure, Rust runs
mark_onlinewhile the pool lock is still held. mark_onlinechecksmountpoint -q; pre-mount failures short-circuit here.- Rust sets permissions (
root:poolAccessGroup 2770) ifpoolAccessGroupis configured. - When
systemd_lifecycle = true, Rust startsbraid-online.serviceonly when the initial snapshot wasinactiveorfailed. - If activation fails: prints WARNING to stderr, then preserves the command’s original exit result. Pool is mounted and usable; only the shutdown hook is missing.
On lock:
- Plain
braid lockacquires/run/braid-stop-coordinator.lock, then/run/braid-pool.lock. - When
systemd_lifecycle = true, Rust stopsbraid-scrub.timer,braid-scrub-resume-trigger.service, thenbraid-scrub.service(timer first prevents re-trigger; trigger before service prevents the trigger from queuing a fresh start of the service being stopped; service last cancels in-flight scrub). - When
systemd_lifecycle = true, Rust iteratessystemctl show -P BoundBy braid-online.serviceand stops each remaining bound consumer (samba, nfs, future). The scrub units already handled in step 2 are skipped. This mirrors the cascade systemd performs on shutdown for user-initiatedbraid lock. - CLI unmounts pool + closes LUKS.
- Plain
braid lockwritesdone\nto/run/braid-stop-coordinator.lock. - When
systemd_lifecycle = true, Rust checks the mount is gone and runssystemctl stop braid-online.servicesynchronously so the command returns only after the lifecycle owner is inactive. The synchronous stop runs only when the post-cleanup mountpoint check confirms the mount is gone; if the check itself fails, Rust warns and skips the stop, leaving the unit active for the operator to retry. The recursiveExecStopreentry polls the coordinator, observesdone\n, and exits 0.
On system shutdown:
- systemd stops
braid-online.service(if active); itsBindsTo+Aftercascade stops the scrub units and any full-triad consumer first. ExecStop then re-runs the same scrub-stop +BoundByiteration as the “Onlock” steps 2-3. For the scrub units and any consumer that follows the documentedWantedBy+BindsTo+Aftertriad, the cascade has already stopped them, so these re-issued stops are no-ops. A consumer that declaresBindsTowithoutAfterhas no stop-ordering guarantee and may still be active when ExecStop runs, so the explicit blocking stop here is what frees the mount. Running the pre-steps unconditionally covers both cases, keeping teardown code-owned and independent of cascade ordering. ExecStop = braid lock --systemd-stop --deadline-secs <n>waits for an in-flight plainbraid lockto finish through the stop coordinator, or waits for the pool lock up to the configured deadline.- Lock dispatch loads membership from
pool.json; ifpool.jsonis absent or corrupt, it warns and proceeds with empty membership because mapper cleanup still requires per-candidate LUKS UUID verification. - CLI unmounts and closes LUKS. If sysfs reports a running btrfs
balance,--systemd-stopfirst runsbtrfs balance pauseso the kernel persists the paused balance before LUKS close; if sysfs reports an already-paused balance, teardown proceeds directly to unmount. Next-bootbraid recoverfails closed on that persisted paused balance and preservespending-op.jsonfor manual inspection instead of resuming it. Plainbraid lockstill refuses all active exclusive operations. The systemd-stop path also retries transientumountEBUSYlonger than user lock so a survivingbtrfs balanceprocess can release its mount fd during shutdown. - Drives are safe to power off.
Pool lock mutual exclusion
Pool mutators, alert-state mutators, key enrollment, lock, and discover --write (unlock, add, recover, remove, remove-missing, replace, enroll, lock, discover --write, ack, monitor) acquire an exclusive flock on /run/braid-pool.lock in Rust dispatch before reading pool state. unlock, add, recover, remove, remove-missing, replace, enroll, lock, and discover --write are non-blocking fail-fast commands: if the lock is already held by another braid process, the CLI exits 1 immediately with braid: another braid operation is already in progress and the user must retry once the active operation completes. Bare discover is read-only and does not acquire the lock. ack waits up to 10 seconds before returning a retry message. monitor exits 0 silently on contention so a skipped timer cycle does not start alert notification. The lock is held through post-processing (permissions, braid-online activation/deactivation). Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on current locked state. See Principle 12.
Lock acquisition site
For non-dry-run pool mutators, alert-state mutators, key enrollment, lock, and discover --write, the operation lock is acquired in cli/src/main.rs dispatch before config load, pool.json load, journal read, identity probes, subprocess health probes, or interactive prompts. The shell wrapper must not acquire /run/braid-pool.lock; it execs the Rust binary and leaves critical-section ownership to dispatch.
A command started during another mutator could otherwise read stale state, then acquire the lock after the first command finishes and act on old inputs. Late acquisition also regresses the fail-fast UX – users see prompts and probes complete before being told the operation is contended.
The pool lock is the first real execution boundary. Do not model it after the sleep inhibitor’s late-acquisition pattern: the inhibitor protects against suspend mid-operation and can wait until the irreversible window; the pool lock protects against state-staleness and must precede any read of pool state.
ExecStop bounded-wait pattern
When a unit’s ExecStop= invokes a CLI that needs a contended resource (e.g. braid-online.service ExecStop=braid lock colliding with an in-flight mutator that holds the pool lock), the ExecStop path gets a distinct bounded-wait variant – not a fail-fast call. “ExecStop fails fast; in-flight work finishes and a later stop attempt succeeds” is not a valid design: during shutdown there is no later stop attempt. systemctl poweroff can leave the resource (mounted btrfs / open LUKS) in an inconsistent state, and the “in-flight mutator finishes before TimeoutStopSec” claim is not guaranteed.
Current pattern: braid-online.service runs braid lock --systemd-stop --deadline-secs ${braid.lockSystemdStopDeadlineSecs}. The module default is 270 seconds and an assertion requires it to be strictly less than braid-online.service TimeoutStopSec (300 seconds). That deadline bounds only stop-coordinator and pool-lock acquisition; once lock cleanup reaches btrfs balance pause or umount, any kernel wait to quiesce btrfs has no userspace timeout and is bounded only by the unit’s TimeoutStopSec (300 seconds). The systemd-stop path also has a longer transient-busy umount retry (60 attempts at 500ms) because btrfs-progs holds the mount fd while blocked in BTRFS_IOC_BALANCE_V2 and can survive the Rust parent briefly during shutdown. Regular braid lock stays fail-fast for user invocations; the bounded-wait path is documented and tested as a distinct mode.
systemctl start/stop inside held-resource windows
systemctl start <unit> on an already-active oneshot+RemainAfterExit unit is a no-op at the work level, but it still queues a job. If a stop job for the same unit is already in flight (because someone else invoked systemctl stop), the start queues behind the stop. If that stop’s ExecStop= is itself blocked on a resource the caller holds, the result is a deadlock.
This is load-bearing for any CLI that both holds a resource and uses systemctl start/stop on a unit whose ExecStart=/ExecStop= touches that resource (e.g. Rust dispatch holding pool.lock while activating braid-online.service whose ExecStop calls braid lock).
These rules govern start/stop of braid-online.service itself. The
systemctl stop calls in run_lock_pre_steps target bound consumers and scrub
units, not the lifecycle owner, so they queue no job against
braid-online.service and the start-behind-stop deadlock above does not apply
to them.
Rules:
- Snapshot full unit state at the start of the held-resource window with
systemctl show -P ActiveState <unit>. Do NOT usesystemctl is-active– it returns “active” only foractive, classifyingactivatinganddeactivatingas not-active. Adeactivatingunit (its ExecStop is already running and waiting on the held resource) snapshotted as “not active” leads the caller to issue astartthat queues behind the in-flight stop – the exact deadlock the snapshot was supposed to prevent. - Only emit
systemctl start <unit>at the end of the window if the snapshot wasinactiveorfailed. Skip whenactive,activating, ordeactivating. See ADR 026 snapshot rule. - Only emit
systemctl stop <unit>at the end of the window if the snapshot wasactiveoractivating. Skip wheninactive,failed, ordeactivating.- Exception: plain
braid lock’s post-successmark_offlineruns a synchronoussystemctl stop braid-online.servicewithout a stop-side snapshot. It is safe because/run/braid-stop-coordinator.lockplus thedone\nprotocol guarantees the recursiveExecStopreentry exits 0 once plainbraid lockhas finishedcmd_lock, instead of queuing behind the in-flight stop. This coordinator is the mechanism that replaces the stop-side snapshot gate formark_offline; see ADR 026 stop coordinator.mark_offlineskips the synchronous stop when the post-cleanupmountpoint -qcheck itself fails (e.g.OnlineError::Spawnmid-shutdown): the unit stays active and the operator retries. Treating unknown mount state as still-mounted mirrorsmark_online’s start-side fail-safe.
- Exception: plain
Consumer dependency contracts
Services that depend on the pool being mounted use one of three patterns:
Frequent periodic services (monitor): ConditionPathIsMountPoint only. Neither After nor BindsTo on mnt-storage.mount – those directives force systemd to load the unit, which doesn’t exist until the CLI mounts the pool at runtime (auto-generated from /proc/mounts). The condition gate silently skips the service when unmounted. Fires every 5 minutes – missed fires are cheap, so lifecycle binding is unnecessary.
Infrequent periodic services (scrub): The timer, scrub service, and resume trigger use BindsTo + After on braid-online.service; the timer and trigger are wantedBy the online unit. The timer’s active lifecycle matches the pool’s online period. Persistent=true handles catch-up for overdue fires. Unlike the monitor timer (which fires every 5 minutes and can afford missed runs), the monthly scrub timer cannot wait until next month if it misses – lifecycle binding ensures it fires on the next unlock. The scrub service and resume trigger also get ConditionPathIsMountPoint as defense-in-depth. For manual lock, Rust dispatch stops the timer, resume trigger, and scrub service before unmount (see above).
Long-running services holding open files (samba, nfs): Use the full WantedBy=braid-online.service + BindsTo=braid-online.service + After=braid-online.service triad (same shape as the scrub timer above), plus ConditionPathIsMountPoint=<pool mount>. BindsTo + After ensures systemd stops them before braid lock runs ExecStop, preventing unmount failures from busy filesystems; WantedBy ensures they restart automatically when braid unlock reactivates braid-online.service. The triad handles the unlock-start and lock-stop lifecycle, but these consumers carry their own boot or direct-start edges – NixOS wants samba-smbd.service from samba.target and nfs-server.service from multi-user.target. For starts not initiated by braid-online.service, ConditionPathIsMountPoint is the load-bearing gate that prevents serving an offline mount directory. Rust dispatch iterates BoundBy braid-online.service and stops these consumers before unmount, mirroring the cascade systemd performs on shutdown for user-initiated lock. See ../../guides/sharing-and-permissions.md#binding-shares-to-the-pool-lifecycle for the user-facing example.
Key design constraints
- No hard boot dependencies.
wantseverywhere, neverrequires. Pool failure never blocks boot. - Rust-synchronized lifecycle. For dispatch-managed operations, Rust keeps
braid-onlinesynchronized with pool mount state: it activates the service only aftermountpoint -qsucceeds, and deactivates it after a successful lock.ConditionPathIsMountPointon the unit is defense-in-depth against directsystemctl startwhen unmounted. Out-of-band mount or unmount bypasses dispatch and can leavebraid-onlinestale;braid lockhandles already-unmounted pools gracefully. - One passphrase prompt.
braid-unlock.serviceis the sole interactive prompt source. The CLI opens all LUKS devices from that single passphrase. - Graceful degradation. If
braid-onlineactivation fails, the pool is still mounted and usable – only the shutdown hook is missing (warned to stderr). - One pool operation at a time. Enforced by a non-blocking
flockin Rust dispatch, not wrapper logic or unit topology – concurrent attempts are rejected, not queued. See Principle 12.
See
modules/braid/storage.nix— unit definitionsmodules/braid/monitor.nix— monitor/alert unitsmodules/braid/braid-wrapper.sh— pure exec shim- 026-pool-lock-rust-owned.md — Rust-owned pool lock and lifecycle synchronization
- 003-resilient-boot.md — why no hard dependencies
- 017-runtime-disk-membership.md — lifecycle model context
tests/module/systemd-lifecycle.py— state machine test suite
Inhibit Sleep During Non-Interruptible Operations
Principles:
Context
braid enables whole-system suspend via autosuspend. That is the right default for a quiet, low-power NAS, but it creates a failure mode for long-running storage operations that should not be interrupted mid-flight.
btrfs replace is the motivating example. Upstream btrfs explicitly warns that suspend/hibernate can interrupt device replace and recommends inhibiting sleep before running it. On newer kernels, suspend can cancel the replace outright; on older kernels, suspend can leave braid to recover a broken topology after wake. The same risk profile applies to btrfs device remove (long-running data migration) and to the conditional balances in add and remove-missing (pool_balance_raid1 after add to ≥2 disks; maybe_restore_raid1 after clearing the last missing device).
braid needs a clear rule for when to hold a sleep inhibitor, because “just acquire it for the whole command” is too broad:
- It is unnecessary and user-hostile to block suspend while waiting for confirmation or passphrase entry.
- It is correct to block suspend once the command is entering the non-interruptible mutation window where interruption risks corruption, degraded topology, or restarting hours of work.
systemd guidance
braid follows systemd’s inhibitor model directly:
systemd-inhibitis for work that should not be interrupted, such as recording media or similarly sensitive long-running operations.blockinhibitors are for cases where sleep must be refused outright while the critical section is active.delayinhibitors are for short grace periods where a service needs time to prepare for sleep, not for hours-long work.- Inhibitors should be held only for the shortest window that actually needs protection.
Primary references:
systemd-inhibit(1): https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html- systemd Inhibitor Locks: https://systemd.io/INHIBITOR_LOCKS/
Decision
braid acquires a What=sleep, Mode=block inhibitor only for the non-interruptible portion of a long-running operation.
The inhibitor boundary is:
- Run interactive prompts, passphrase collection, and reversible validation first.
- Acquire the sleep inhibitor immediately before the irreversible mutation window begins.
- Keep it held for the full duration of the non-interruptible work, including any required follow-up work that is part of the same intent command.
- Release it immediately when that critical section ends, whether by success, error, or signal-driven unwind.
braid must not hold a sleep inhibitor during:
- confirmation prompts
- passphrase entry
- dry-run output
- reversible preflight that can fail without leaving partial state
Current application
braid replace, braid remove, braid remove-missing, and braid add all hold a What=sleep, Mode=block, Who=braid logind inhibitor for their respective mutation windows. Each command acquires the inhibitor immediately before journal::write_journal(), after all interactive/reversible work, and holds it until the function returns (success, error, or signal-driven unwind).
For all four commands, the protected scope is the post-journal critical section, and the excluded scope is the same:
--dry-run- confirmation prompt
- passphrase reads
- reversible validation and identity checks
Failure to acquire the inhibitor returns a Validation-shaped error before the journal is written, so an environmental logind failure does not strand the user in recovery mode.
braid replace
The protected scope includes:
- journal write and post-commit phase rewrite
- new-disk LUKS initialization/open
btrfs replace start- best-effort old-mapper close for live replacements
- post-replace resize
- post-replace soft RAID1 balance for missing-path replacements that clear the last missing device
The new-target LUKS identity check is deliberately two-tier: the primary gate (cli/src/replace.rs#verify_existing_luks_new_target_preflight) runs pre-journal under the excluded “reversible validation and identity checks” rule above, so an operator disk-swap or backing-drift in the post-confirmation window aborts on the reversible side without stranding pending-op.json; a residual re-probe (probe_existing_luks_new_target_uuid closed-mapper arm, verify_existing_luks_open_mapper_target open-mapper arm) stays post-journal inside the “new-disk LUKS initialization/open” scope to guard the narrow journal->open window that contains the optional slot-1 keyfile enroll. Do not collapse it to one tier.
braid remove
The protected scope includes:
- journal write
- the optional pre-remove
pool_balance_single(RAID1→single) when only one device will remain btrfs device removedata migration- post-remove LUKS mapper close and membership persistence
braid remove-missing
The protected scope includes:
- journal write
btrfs device remove <devid>(chunk relocation viabtrfs_shrink_device; can run for minutes when the missing device had data allocated because surviving RAID1 stripes are rewritten into newly allocated chunks on remaining devices)- post-op membership persistence
- post-commit phase rewrite
- the conditional soft RAID1 balance that converts single-profile chunks (created during degraded operation) back to RAID1 when clearing the last missing device on a multi-disk pool
The inhibitor is acquired unconditionally before journal write, even in the cases where maybe_restore_raid1 will be a no-op. This keeps the boundary rule simple (“acquire before journal”) and matches the rest of the suite. The “savings” of skipping acquisition when the soft balance will not run are tiny on a NAS that is idle most of the time.
braid add
The protected scope includes:
- journal write
- LUKS format/header backup/open of fresh disks
pool_bootstrap_mount/pool_bootstrap_mount_raid1(bootstrap path) orpool_add_devicefollowed by the conditionalpool_balance_raid1(add-to-existing-pool path) when the post-add pool has ≥2 devices- post-op membership persistence
As with remove-missing, the inhibitor is acquired unconditionally before journal write. The bootstrap path’s mkfs phase is fast but still irreversible across the journal boundary; the add-to-existing path’s RAID1 balance is the long-running phase that the inhibitor primarily protects.
The no-op early-return path (all requested disks already in the pool) returns before the inhibitor seam fires — no journal is written, so no protection is required.
braid recover follows the same boundary for replayed destructive work. In particular, add PoolMutation recovery resolves and verifies the needed passphrase before acquiring a sleep inhibitor; the inhibitor is acquired only after reversible credential checks pass and immediately before replaying target preparation or btrfs membership work.
Excluded: braid lock
braid lock deliberately does not acquire the sleep inhibitor, even
though its mutation window (umount + per-mapper cryptsetup close) is
non-trivial in wall-clock time. This is the worked example of the
deciding question below applied to lock work specifically:
-
Recoverability. A lock interrupted mid-flight leaves a state that re-running
braid lockadvances on, to the extent its existing probes can detect. Specifically:plan_lock’smountpoint -qskips the umount step when the pool is already unmounted (cli/src/lock.rs’splan_lock).- The per-mapper close path checks
fs.exists("/dev/mapper/<name>")before issuingcryptsetup closeand reports “already closed” otherwise, so closed membership mappers do not re-error on a follow-up run. - Orphan mappers (
braid-*paths not inpool.json) are re-scanned on each invocation and closed; close failures still surface as fatal errors, and a/dev/mapperscan failure is warned and yields an empty orphan list for that run – not silently swallowed.
Unlike
replace/add/remove/remove-missing, there is no kernel-level topology corruption window and no hours-long restart cost. The point is that a partially-completed lock does not poison subsequent invocations – not that every failure is hidden. -
Shutdown-driven
ExecStop. Whenbraid lockruns asbraid-online.service’sExecStop=during system shutdown, the system is heading toshutdown.target/power-off, not to suspend. A sleep inhibitor acquired during that window is redundant – logind does not schedule a suspend transition mid-shutdown. -
Manual stop and user-lock reentry.
ExecStop=braid lockalso fires on a manualsystemctl stop braid-online.serviceand on the Rust dispatch post-lockmark_offline(cli/src/online_state.rs) for user-initiatedbraid lock, gated onsystemd_lifecycle(seedocs/design/decisions/018-systemd-lifecycle.md:131andmodules/braid/storage.nix’sbraid-onlinedefinition). Those paths do not enjoy the shutdown-driven guarantee above; their justification is the recoverability + short-duration argument, not the shutdown-target one. -
Suspend context.
braid-online.servicehas noConflicts = sleep.target(seemodules/braid/storage.nix). By the016-auto-suspend.mddesign the pool stays mounted across suspend, so the only realistic mid-lock-suspend race is a user-initiatedbraid lockcolliding with autosuspend’s idle countdown. That window is narrow (lock is short) and the failure mode is recoverable, per the first bullet. -
ExecStopbudget.braid-online.serviceruns lock underTimeoutStopSec = 5min. Adding subprocess work to that path (asystemd-inhibitfork plus its supervisedsh + sleepchild) buys no protection commensurate with the added shutdown-path complexity.
If a future change makes lock’s mutation window genuinely long (e.g. a multi-minute pre-lock balance), revisit this exclusion under the same deciding question.
Excluded: braid enroll
braid enroll does not acquire the sleep inhibitor despite mutating
LUKS slot 1 on each pool disk. Applying the deciding question to
standalone enroll specifically:
- No journal, no recovery-mode lockout. Standalone enroll writes no
operation journal (
EnrollPlan::executeincli/src/enroll_key_file.rs). Suspend mid-loop cannot strand the operator in recovery mode, which is the failure surface this doc’s “Validation-shaped error before journal write” promise protects against for the four inhibitor-using commands. - Recoverability.
plan_enrollmentprobes each candidate viaprobe_keyfile_enrollmentand short-circuits disks whose slot 1 already verifies the keyfile (AlreadyEnrolled). A partial enroll leaves only the un-enrolled disks for the next invocation: re-runningbraid enroll DIR(existing-keyfile mode) advances on partial state, the same property that justifieslock’s exclusion. Note thatbraid enroll --generateis not same-command idempotent – a partial--generaterun leavesDIR/braid.keyon disk, andvalidate_key_file_pathrefuses a second--generateagainst an already-present keyfile. Recovery for an interrupted--generaterun is to drop--generateand re-run as a regular enroll against the now-existing keyfile. - Bounded mutation window. Each disk pays one Argon2-bounded
cryptsetup luksAddKey(about 2-3 sec on default parameters) plus a sub-secondcryptsetup luksHeaderBackup. A three-disk pool’s total enroll window is single-digit seconds with no long-running btrfs work to protect. - No btrfs topology mutation; LUKS2 writes use cryptsetup metadata
locking. Enroll does not touch btrfs membership or chunk allocation,
which is the topology-corruption risk surface this doc was written to
protect. LUKS2 metadata writes are serialized by cryptsetup’s own
metadata locking. After each successful
cryptsetup luksAddKey,apply_enrollmentwrites a local.luksheaderas input to the existing off-system backup workflow (seedocs/internals/luks-unlock.md); the local file is a transient byproduct of a successful mutation, not a recovery mechanism for an interrupted one. Recovery from actual header damage uses the operator’s off-system backup, identical to every other LUKS-mutating command in braid.
The same luks::enroll_key_file call is held under an inhibitor when
invoked from braid add --enroll or braid replace --enroll, but that
is incidental: those commands already hold an inhibitor for their
journal-protected btrfs work, and the keyfile call happens inside that
existing window. Standalone braid enroll has no btrfs work to protect
and no journal boundary to guard, so an inhibitor would buy nothing.
If a future change adds long-running follow-up work to braid enroll
(e.g. a pool-wide rekey or a balance after enrollment), revisit this
exclusion under the same deciding question.
Consequences
- suspend is blocked only when interruption is actually dangerous
- operators are not prevented from suspending the host while braid is still waiting on human input
add,remove,remove-missing, andreplaceall follow the same boundary rule; future long-running commands should reuse it instead of inventing command-specific behavior- failure to acquire the inhibitor (e.g. logind unreachable) is a clean validation error before the journal is written, never a recovery-mode lockout
The same default does not automatically apply to every long-running task; the deciding question is whether suspend would make the operation incorrect, unsafe, or expensive to restart.
UPS Integration
Principles:
Context
A btrfs RAID1 pool tolerates clean shutdowns, but sudden power loss during active I/O – especially during a long-running btrfs replace, btrfs device remove, or post-add/remove balance – can leave the pool in a state that requires manual recovery. This is the same risk surface that decision 019 protects against for suspend/wake, but it cannot use the same control model. A sleep inhibitor actively blocks the operating system from suspending; braid cannot analogously block a UPS from running out of battery. The control model here is different: reject avoidable starts on battery up front, and prove journal recovery for the unavoidable mid-mutation case.
A UPS solves this only if the host cooperates. NUT (Network UPS Tools) is the standard Linux interface, and nixpkgs already provides a mature power.ups module that configures NUT declaratively – units, users, udev rules, killpower handling. braid’s job is not to reimplement that, but to layer opinionated policy on top so that enabling UPS support gives a home NAS three specific guarantees:
- Orderly shutdown before battery exhaustion for ordinary mounted operation.
- Preflight refusal to start pool-mutating commands unless the UPS reports verified utility power (
OL). - Live UPS state visible in
braid ups statusand the TUI; live UPS status is used for preflight safety and upsmon critical-state shutdown (normallyOB+LBtogether, perreference/nut/clients/upsmon.c:1404).
The guarantees do not extend to “safe against any power loss.” A UPS firing LB during a mutation that started on AC still interrupts that mutation. Recovery for that case falls to the existing journal + braid recover path, and must be proven per mutation class by VM tests before this decision flips to Active.
“Just alert the user on low battery” is insufficient for guarantee (1): a prolonged outage with nobody present would still exhaust the battery during an active mount. The host must power off before battery exhaustion, because decision 018’s teardown sequence (braid-online.service ExecStop -> btrfs umount -> luks close) needs a non-trivial window of live power to complete cleanly.
Decision
Scope: standalone, USB, single-host
v1 supports one NUT-compatible UPS connected over USB to the NAS, monitored by the NAS itself. Not supported as first-class:
- networked NUT (primary/secondary across multiple machines)
- serial
apcsmart,snmp-ups, or other non-USB drivers - multiple UPSes per host
An escape hatch (driver = "...", port = "...") exists for users whose UPS speaks a non-USB protocol, but braid does not guarantee correct behavior outside the USB path.
Rationale: USB UPSes cover the vast majority of home NAS deployments. Every non-USB topology adds configuration surface (network auth, SNMP community strings, serial port permissions) that braid would have to validate and test. Single-host standalone avoids the two-machine primary/secondary dance and its timing/credential complexity.
Wrap power.ups, do not reimplement NUT
The braid module sets power.ups.* values from its higher-level options. It does not write ups.conf, upsd.conf, or upsmon.conf directly, and does not define its own nut-* systemd units.
This is a deliberate departure from the pattern in modules/braid/fan-control.nix, which owns its unit because nixpkgs’ hddfancontrol module has concrete lifecycle bugs. The nixpkgs power.ups module has no equivalent known defect; reimplementing its surface would duplicate work and diverge over time.
Data source: shell out to upsc
The TUI and braid ups status command read UPS state by invoking upsc <name> and parsing its key/value output. This matches every other braid parser (btrfs, cryptsetup, lsblk, smartctl, smartd, hddfancontrol).
A parse_upsc module in cli/src/parse/ handles the parse, with stable and unstable golden fixtures in cli/tests/fixtures/. NUT (networkupstools) joins btrfs-progs, cryptsetup, and util-linux in the parser-critical toolchain (see decision 010 and parser compatibility), with fixture refresh required on any nixpkgs bump that changes its pinned version.
Pinning is load-bearing. A new braid.packages.networkupstools option is added alongside the existing btrfsProgs, cryptsetup, and utilLinux pins, defaulted to nixos-26.05’s networkupstools. The module uses this pin to configure the NUT package the power.ups service resolves (exact nixpkgs option name to confirm during implementation) and includes the same derivation in the CLI wrapper’s PATH so that upsc invoked from braid ups status resolves to the tested version rather than whatever the host’s system path provides. Decision 010 and principle 10 are updated in the same implementation to name NUT as parser-critical.
ups.status is parsed into an ordered, deduplicated list of flags (OL, OB, LB, CHRG, DISCHRG, RB, …), not an enum. Flags are stored in upsc emission order; membership and dedup give set semantics without imposing a sort. Display severity is derived from the combination; unknown tokens are preserved in the parsed model so that new NUT statuses do not silently disappear.
braid ups status defaults to a curated human-readable summary and supports --json for the typed parsed model. Raw upsc passthrough is not exposed; users who want that can still run upsc directly.
The --json success shape preserves the typed parsed model at top level. If upsc exits 0 but ups.status is empty or missing, the JSON output stays exit 0 and adds top-level "warning": "ups_status_empty" beside the parsed body. Scripts must treat either .error or .warning as a sentinel that the body is not trusted healthy UPS state. The status_flags array preserves first-seen ups.status token order across the human, --json, and TUI surfaces – braid imposes no sort of its own. Order is deterministic for a given UPS state (whitespace is normalized and repeated tokens collapse to first-seen); it is not a byte copy of the raw ups.status: line.
Shutdown-on-LB = systemctl poweroff
When NUT fires the low-battery (LB) event, upsmon runs systemctl poweroff. systemd’s standard shutdown sequence then unwinds braid-online.service (decision 018), which closes the btrfs mount and LUKS mappers. The host powers off via normal means before the UPS exhausts its battery.
This is not “alert only.” The host genuinely shuts down, because the only safe state during a prolonged power outage is off. An alert-only policy would require the user to react in time, which defeats the point of unattended operation.
Reject pool-mutating commands unless UPS reports utility power (preflight hygiene only)
When braid.ups.enable = true, braid add, braid remove, braid remove-missing, and braid replace query UPS status at preflight and refuse with a Validation-shaped error unless the UPS status can be trusted as explicitly on utility power. The check is fail-closed: it refuses on upsc invocation or query failure (dead upsd, unknown UPS name, or exec failure), an empty or missing ups.status, any critical flag (LB, TESTFAIL, COMMBAD, FSD – the same set the TUI paints red), on-battery (OB), or any status set missing OL. Known non-critical advisory states such as OL RB, and unknown tokens co-present with OL and no known blocker, still pass because utility power is explicitly present. The check sits alongside the existing preflight checks, before any journal write.
This is preflight hygiene, not a mutation-window guarantee. It narrows the surface that journal recovery must cover by rejecting the easy case – “user starts braid replace while the power is already out” – but it cannot and does not prevent LB from firing mid-mutation on work that started on AC. Mid-mutation power loss is handled by the existing journal + braid recover path; see the recovery-proof obligation in Open Questions.
This is not the power-side equivalent of decision 019’s sleep inhibitor. A sleep inhibitor actively blocks suspend for the duration of the mutation window; braid cannot analogously block UPS-driven shutdown, because the UPS is dying and no amount of inhibiting changes that. Instead, the contract is: reject the avoidable case up front, and rely on recovery for the unavoidable case.
Alert-model integration is deferred
Integrating UPS events into the shared AlertState / AlertCause model (decision 014) is deferred to a future ADR. Decision 014 guarantees “alerts stay latched until braid ack” – the right shape for event-driven causes (disk errors, smartd), but wrong for live-state conditions like OB / LB (users expect those to clear when the UPS returns to OL). Reconciling that requires splitting AlertCause by persistence semantics (LatchedUntilAck vs. ActiveWhileConditionHolds) and updating merge_into_latch, ack, status, and the alert-test matrix. That is a core-invariant change that deserves its own ADR; smuggling it into UPS v1 would conflate two distinct concerns.
v1 therefore surfaces UPS state only through braid ups status and the TUI. Operators who are not actively watching those surfaces will not see on-battery or comms-loss conditions asynchronously in v1. This is a known gap until the follow-up ADR lands.
braid-online becomes safety-critical under UPS
mark_online (cli/src/online_state.rs) warns and exits successfully when systemctl start braid-online.service fails after a successful unlock/add/recover. When UPS support is enabled, this silent-degradation path is unsafe: the user believes LB will trigger a clean shutdown, but without braid-online.service active, its ExecStop does not run and LUKS close is not guaranteed to complete before power dies.
braid doctor and the TUI flag “pool mounted but braid-online inactive” as a high-severity configuration fault whenever UPS support is enabled. mark_online’s warn-and-continue behavior otherwise remains unchanged; the UPS path adds a new detector, it does not change the underlying unlock sequence. Under systemd_lifecycle = false (CLI-only), the lifecycle path is skipped entirely; the UPS-safety detector fires only when systemd_lifecycle = true and UPS support is enabled.
Upsmon credential lifecycle
NUT requires upsmon to authenticate to upsd even in single-host standalone mode. The credential lives at /var/lib/braid/upsmon.pass with mode 0600, owned by root, outside the Nix store.
Generation: a oneshot braid-ups-secrets.service creates the file if absent with a random token (e.g. head -c 24 /dev/urandom | base64) and exits. The oneshot is wired with before = [ "upsd.service" "upsmon.service" ] and requiredBy = [ "upsd.service" "upsmon.service" ] (the actual nixpkgs power.ups unit names), so upsd and upsmon hard-fail to start if secret creation fails rather than racing it. systemd.tmpfiles rules ensure /var/lib/braid/ exists with correct ownership before the oneshot runs. The file is stable across rebuilds; regeneration happens only on explicit deletion. No rotation is performed because the scope is loopback upsmon<->upsd on a single host.
Reference: the rendered NUT configs consume the file via power.ups.users.<name>.passwordFile and power.ups.upsmon.monitor.<name>.passwordFile (not inline passwords), so the token never enters the Nix store or nix-store --query output.
Proposed config surface
braid = {
enable = true;
ups = {
enable = true;
name = "ups"; # identifier used by upsd and upsc
driver = "usbhid-ups"; # USB default; covers the vast majority of UPSes
port = "auto"; # usbhid-ups's standard "find the device" value
};
};
Defaults applied internally, not surfaced as options in v1:
- standalone mode (upsd + upsmon on the same host, no network monitors)
SHUTDOWNCMD = systemctl poweroff- upsmon credentials per “Upsmon credential lifecycle”
Note: NOTIFYCMD is intentionally not configured in v1 – alert-model integration is deferred (see “Alert-model integration is deferred” above).
The configured name is also written to /etc/braid/config.json so that braid ups status and the TUI do not have to guess which UPS to query.
Deferred
- networked NUT (primary/secondary across hosts)
- non-USB drivers as first-class support (work via escape hatch, not tested)
- pre-shutdown grace window with
braid ups abort-shutdown - battery-age reminders driven by
battery.mfr.date+ theRBstatus flag - multi-UPS per host
- UPS-triggered automatic pause of running balance (scrub is cancelled on shutdown and resumed on next pool activation; crash-paused owed RAID1 balance now fails closed in recover while idle/no-paused owed replay still runs)
Resolved questions
Each of these blocked the flip from Draft to Active. All three are now closed by VM tests committed in tests/module/.
- Recovery-proof for mid-mutation power loss (primary blocker). Resolved by the four VM tests in
plans/impl/2026-04-21-forced-shutdown-recovery-proof.md’s matrix:ups-lb-during-replace,ups-lb-during-remove,ups-lb-during-remove-missing, andups-lb-during-balanced-add. Each firesOB LBviaupsrwwhile a different mutation class is in flight, letssystemctl poweroffrun, reboots the VM, and runsbraid recover. The idle/no-paused recovery path still asserts the post-recover state matches what the original mutation would have produced – including no orphaned LUKS mappers, noMISSINGbtrfs entries, no remaining single-profile chunks where RAID1 was intended, and a clearedpending-op.json. The crash-paused owed RAID1 subcase is intentionally narrower:ups-lb-during-remove-missingand the paused branch ofups-lb-during-balanced-addnow assert that recover preservespending-op.json, leaves single-profile chunks visible, and asks for manual btrfs inspection instead of replaying a balance. The Pre-M11 audit also surfaced twocli/src/recover.rsgaps that the same plan landed before the matrix ran:pool_resize_deviceis now replayed forOpKind::Replace, and a soft RAID1 balance is replayed forOpKind::Add,OpKind::RemoveMissing, andOpKind::Replaceonly when btrfs balance status is idle; see balance-soft for the underflow rationale behind the fail-closed branch. - Shutdown ordering for ordinary mounted operation. Resolved by
tests/module/ups-lb-clean-shutdown.{nix,py}(Plan 1’s M7). The VM test mounts an idle pool, firesOB LBviaupsrw, and assertsbraid-online.service’s ExecStop completes (and is not killed byTimeoutStopSec) before poweroff. The defaultTimeoutStopSec = 5minis sufficient for a single-disk pool; larger pools should retain that headroom. - Battery-low threshold. Resolved with the upstream NUT default. Plan 1’s M7 (
ups-lb-clean-shutdown) passed without raisingbattery.runtime.lowfrom its driver-dependent default (often 120s). That test deliberately importstests/module/lib/ups-fixture.nixwithupsmonTimings = nullso upsmon runs at upstream POLLFREQ/POLLFREQALERT/FINALDELAY = 5/5/5 – the runtime-budget claim is therefore backed by representative timings, not the squeezed 1/1/0 cadence the Plan 3 matrix tests use to keep the LB-detection window narrower than an in-flight mutation. Larger real-world pools that risk exceeding the default budget can overridepower.ups.upsmon.settings(or the driver’sbattery.runtime.low) at the deployment level; braid does not need a dedicated option for v1.
Consequences
- enabling UPS support is one line of Nix, plus two optional strings for non-default drivers
- for ordinary mounted operation, the host powers off cleanly on low battery without user intervention
- pool-mutating commands refuse to start unless utility power (
OL) is verified, narrowing the journal-recovery surface to the mid-mutation case - mid-mutation power loss is a supported recovery case, not a guarantee:
braid recoveris load-bearing forreplace/remove/remove-missing/ balancedaddinterrupted by LB-driven shutdown, and VM tests prove both the idle/no-paused success path and the crash-paused owed RAID1 fail-closed path - live UPS state is visible in
braid ups statusand the TUI; users not actively watching those surfaces do not get asynchronous notifications in v1 (alert-model integration deferred to a future ADR) - NUT joins btrfs-progs, cryptsetup, and util-linux as a pinned parser-critical tool; nixpkgs bumps touching
networkupstoolstrigger the same fixture-refresh obligation as the other three - the existing
braid-online.servicelifecycle (decision 018) is load-bearing under UPS; its failure mode is no longer acceptable silent degradation andbraid doctorreflects that
Superseded by Principle 13.
Wait rows in unlock and shared mount helpers
Principle: 13. Announce long-running work
Context
The single-passphrase invariant
(Principle 4) requires braid unlock to verify the supplied credential against every reachable LUKS
member before opening any mapper. On a 3-disk pool that is three
sequential Argon2 derivations – visible to the user as three
back-to-back [wait] passphrase: checking against ... rows.
Two later phases of the same command stayed silent:
- Per-disk
cryptsetup luksOpen. cryptsetup re-derives Argon2 insideluksOpeneven after--test-passphrasealready verified. The user saw three[ok] disk X: unlockedrows arrive one by one with no leading announcement. - Mount phase.
scan_and_mountrunsbtrfs device scan+mkdir+mountin sequence and emits a single[ok] pool: mounted ...row at the end. None of the three steps are announced individually.
The result: between the last verify row and the mount row, the user stared at an inactive terminal for several seconds with no signal that anything was happening.
Options considered
- TTY spinner / progress bar. Rejected: requires a TTY, fights log
capture, and looks broken inside
braid-auto-unlock.servicejournals. - Best-effort ad-hoc waits. Add
[wait]rows whenever a gap is noticed. Rejected: gaps recur whenever a new slow path is added. - Codify “
[wait]before every long-running step” as a project principle now. Rejected: principles are authoritative (docs/design/principles.md); a principle the codebase doesn’t satisfy on the day it lands is a documentation bug.add,replace,remove,remove-missing,recover’s own (non-shared) slow paths,enroll, andlockkeep silent gaps today. - Scope the rule to
braid unlockand the shared mount helpers today; promote to a principle once the other commands comply. Accepted.
Decision
braid unlock – and braid recover’s mount tail, which routes
through the same shared helpers – emit a [wait] row before every
long-running step:
- per-disk
cryptsetup luksOpen(passphrase and keyfile arms), worded[wait] disk {name}: unlocking...; - the mount phase (
btrfs device scan+mkdir+mount), worded[wait] pool: mounting {mount_point}.... The single row covers all three steps because emitting one row per step would be noisy without buying the user any actionable information.
Each [wait] row uses
status_tag::status_line(StatusTag::Wait, ...) and is closed by the
existing per-step success row.
Other interactive commands keep their current behavior until they are individually updated.
Tradeoffs accepted
- Slightly more verbose stderr.
- Enforcement for the in-scope helpers is by VM-test assertion in
tests/cli/braid-unlock.py,tests/cli/braid-unlock-key-file.py, andtests/cli/braid-recover.py. Project-wide enforcement is deferred until promotion to a principle. braid recoverinherits the new rows automatically becauseexecute_mount_onlyandexecute_unlock_and_mountare shared. This is desirable:recover’s mount tail is exactly the same long-running work asunlock’s.
Promotion outcome
Promoted to Principle 13 once add, replace, remove,
remove-missing, recover’s replay tail and self-mount remount
cycle, lock, and enroll were brought into compliance.
See
cli/src/mount.rs–open_disks_with_credentialandscan_and_mounthost the new rows.cli/src/status_tag.rs– the canonicalStatusTag::Waitandstatus_linehelpers.cli/src/unlock.rs:93-96– the already-mounted short-circuit that returns before any helper runs (so already-mounted unlocks emit no new rows).
Dry-run preview model
Principles:
Context
Intent commands originally mixed dry-run rendering seams with execution
planning. Some commands compiled Vec<Step> directly for preview tests, while
execution consumed separate command-specific state. That made it too easy for a
dry-run preview to drift from the work a real run would perform, especially
around LUKS preparation, btrfs mutations, journals, cleanup, and follow-up
maintenance such as resize or balance.
The current model keeps dry-run preview and execution tied to the same typed
semantic decision. Step is only the output shape used to show a preview; it is
not the plan.
Decision
For migrated mutating commands, dispatch owns the read-side fences that must
run under the pool lock before the planner starts: pending-operation preflight
and config loading. The pending-operation preflight must run before config load
so a recovery journal is never hidden behind a config parse error. The planner
then owns pool state loading, live probes, accumulated preview notes, and
construction of a typed work plan. This split finishes the Rust-owned pool-lock
migration: the lock boundary and the config/journal reads it protects now live
above plan_*(), while dry-run and real execution still share the same typed
plan. The command wrapper calls the planner first. On --dry-run, it prints
plan.preview() to stdout. On a real run, it passes the same plan to
execute().
A successful command plan carries:
- accumulated
PreviewNotes, in the order they must render; - a typed
WorkPlancontaining the semantic choices execution needs.
preview() is the public dry-run boundary. It constructs a Preview whose
steps come from work_plan.render_steps(). Notes render first, then steps. A
plan struct must not cache a rendered Vec<Step> alongside its work plan.
execute() consumes the same typed WorkPlan. It must not rediscover or
reinterpret semantic choices already made during planning. It may still perform
execution-time validation that dry-run intentionally cannot do, such as checks
that require a passphrase or a mapper that was closed during planning.
Step is output-only. It may describe risk, human text, and representative
commands for dry-run rendering, but it must not become an execution source, a
planning cache, or a second semantic model.
When planning accumulates notes and then fails later, use a report shape that returns both the error and the accumulated notes. The command wrapper renders those notes to stderr before returning the error, using the same preview note renderers that dry-run stdout uses. This preserves context without duplicating wording.
Output contract
The structured dry-run preview lives on stdout. Preview notes are part of that stdout preview. Real-run notes, and notes preserved on a later planning error, render to stderr through the shared preview renderers so warning and info wording stays byte-compatible across modes.
Confirmation UI
Confirmation UI is not a preview note. The interactive !params.yes block –
the command summary, yes/no prompt, and go/no-go safety warnings attached to
that prompt – is deliberately absent from both --dry-run and --yes output.
In cli/src/remove.rs and cli/src/replace.rs, the 1-disk redundancy warning
belongs to this class because it gates the operator’s final decision about an
explicitly requested action, rather than reporting a discovered precondition.
For remove 2->1, dry-run still surfaces the redundancy-loss consequence as
the RAID1 -> single balance step. For replace, the 1-disk warning is
confirmation-only context for a pool that is non-redundant before and after;
dry-run previews the replacement steps, and no redundancy-changing step exists
for that warning.
Long-running side-effect-free probes that run while building a preview may emit
[wait] / [ok] / [skip] status rows to stderr per
Principle 13. Those rows are
not part of the structured preview.
Fresh-format identity placeholder
A fresh LUKS format mints its identity per-invocation at plan time (ADR-024), so the UUID a real run will write does not yet exist when dry-run renders. Showing the minted UUID would make the preview non-reproducible – two dry-runs of the same command would differ – and misleading, since that value is discarded when dry-run returns and a later real run mints a different one.
So the two fresh-format render sites (cli/src/add.rs#AddWorkPlan::render_steps
and cli/src/replace.rs#ReplaceWorkPlan::render_steps) emit a preview-only
cli/src/cmd.rs#CmdRequest, CryptsetupLuksFormatPreview, whose to_argv
renders a fixed --uuid '<generated-at-format-time>' placeholder (single-quoted
by shell_words). The real run uses CryptsetupLuksFormat with the journaled
identity. Both render through one shared cli/src/cmd.rs#luks_format_argv
builder, so a future luksFormat flag appears in both at once – the “representative
commands” / “Step is output-only” rules in the Decision section still hold; this
is the one place the rendered command intentionally diverges from the real argv.
The preview variant is never executed: cli/src/cmd.rs#RealRunner hard-errors on
it via cli/src/cmd.rs#CmdRequest::is_preview_only before any spawn.
recover is excluded: cli/src/recover.rs#render_add_pool_mutation_recovery_steps
also emits CryptsetupLuksFormat, but its UUID comes from the committed journal
– reproducible and meaningful – so recover keeps rendering the real identity.
Scope
The typed work-plan preview model is the precedent for add, replace,
remove, remove-missing, and recover.
Recover is the one deliberate exception to the read-side planner rule. When
recovering an interrupted existing-pool add and the pool is not already mounted,
plan_recover reconciles the validated add-targets – those present,
LUKS-openable, and not yet pool members – before mount: it opens any whose
mapper is closed (resolving the unlock credential once, and only then), and
btrfs-scans a target only when its mapper shows a btrfs signature. All of this
is gated by !dry_run (discover_add_targets_before_mount, after an
already-mounted short-circuit). The preflight is non-destructive and exists for
two reasons: resolving the credential in the preflight window where an
interactive prompt belongs, then caching it so execute reuses it without a
second prompt (single passphrase, Principle 4); and making an
already-committed-but-closed target visible to the kernel before the initial
mount so the mount assembles it instead of recover re-adding or re-formatting
it. It is not a general license to mutate inside plan_*().
The LUKS-UUID-identity migration also gave lock a typed close set
(LockCloseSet carrying ordered LockMapperClose entries in
cli/src/lock.rs). Dry-run step compilation (compile_lock_steps),
btrfs device scan --forget, and LockPlan::execute all read from
that close set so preview and real execution share one identity
classification. LockPlan::preview() derives Vec<Step> on demand
from the close set rather than caching rendered steps.
Older dry-run seams in unlock and enroll may remain until those
commands are intentionally migrated. Do not use their older helpers or
cached step fields as precedent for commands already on the typed
work-plan model.
Consequences
- Tests about user-visible dry-run output should prefer
plan_*()followed byplan.preview().render(). - Tests about the step list should use
plan.preview().steps. - Narrow leaf-renderer tests may call
work_plan.render_steps()directly when reaching the case throughplan_*()would require noisy unrelated setup. - New migrated command plans should store semantic work, not rendered steps.
See
cli/src/preview.rs–Preview,PreviewNote, and canonical rendering.cli/src/cmd.rs–Stepand dry-run command rendering.docs/design/decisions/012-intent-cli.md– intent-command safety model and dry-run probe constraints.plans/impl/2026-05-06-unify-cli-plan-execution.md– historical implementation plan for the migration that introduced this typed work-plan preview model.
Secret handling discipline
Related:
- 004-single-passphrase.md
- 018-systemd-lifecycle.md
cli/src/secret.rs
Context
braid handles two kinds of LUKS secret material in process memory:
- user-entered passphrases used for
cryptsetupopen, verify, format, and keyfile enrollment; - generated keyfile bytes used for slot-1 auto-unlock enrollment.
These values must exist briefly in process memory, but they should not escape that narrow window through ordinary Rust strings, buffered readers, command arguments, debug output, or long-lived scopes.
Decision
LUKS passphrase plaintext is represented by secret::Passphrase, a newtype
around Zeroizing<String>. LUKS keyfile byte buffers remain
Zeroizing<[u8; KEYFILE_SIZE]> because the generated bytes never leave the
function frame that writes them.
Every passphrase read path must use unbuffered Read, not BufRead, and must
consume input one byte at a time into pre-sized zeroizing storage. This avoids
std-internal buffering that can retain plaintext outside braid-owned
Zeroizing values. Confirmation reads in cli/src/confirm.rs intentionally
accept Read, not BufRead, for the same reason: confirmation must not
pre-drain bytes needed by a later --passphrase-stdin read.
Every secret-bearing read must enforce a hard byte cap while reading.
Passphrase reads use PASSPHRASE_MAX_BYTES = 64 * 1024; confirmation reads
use CONFIRM_MAX_BYTES = 256. New secret-read sites must declare and enforce
their own cap instead of allowing unbounded growth of a zeroizing buffer.
Anything inside a Passphrase must reach subprocesses through
CommandRunner::run_with_stdin, never through CmdRequest::to_argv. The
Passphrase::expose_secret() method is the grep-friendly plaintext egress
point for these handoffs. ps(1) must never be able to surface a passphrase.
Generated random secrets must drop before any later syscall whose duration is
unbounded. In particular, generated keyfile bytes are scoped so the
Zeroizing<[u8; KEYFILE_SIZE]> is dropped before the durability
sync_all() on the written file.
Every type that owns secret bytes must implement Debug with redacted output.
The canonical rendering is <redacted>.
braid does not use in-process passphrase equality as an authentication mechanism. Normal passphrase verification is delegated to cryptsetup/LUKS. The only current in-process comparison is the local double-prompt confirmation flow for fresh formatting, where braid checks that two user-entered strings match before one becomes the new pool passphrase.
Threat Model
These rules harden braid’s in-process memory image against accidental plaintext
retention in process snapshots, core dumps, and swap residue. They do not
defend against a privileged attacker on the running host with ptrace,
/dev/mem, root access to /proc/<pid>/mem, or equivalent capabilities.
The target invariant is narrower: no plaintext beyond the smallest practical in-process window, and no untyped plaintext values at module boundaries.
Active – Refines 017-runtime-disk-membership.md.
Decision: LUKS UUID Is Disk Identity
Principle: Stable identifiers
Context
Runtime membership originally used the operator disk name as the key in
pool.json. The same name also appears in mapper names and LUKS labels, so
code could accidentally treat display/runtime handles as identity. That made
label drift, mapper drift, and cloned disks hard to reason about: a member
could be the same encrypted device while its label or mapper path changed, or
two different by-id paths could expose the same cloned LUKS header.
Decision
Use the LUKS UUID as the persistent disk identity. pool.json and
pending-op.json membership snapshots are keyed by canonical LUKS UUIDs.
DiskMember.name remains the operator-facing name and DiskMember.by_id
remains the hardware address used to reach the device. DiskMember.devid is
persisted only as prior-binding state for btrfs cases where the live device is
observable by devid but not by LUKS UUID, such as null_underlying mappers and
missing_devids.
Fresh add and replace operations pre-generate the UUID that cryptsetup must
write, store that UUID in the journal before mutation, and pass it through the
structured CryptsetupLuksFormat request. User-supplied --luks-format-arg
values may not override --uuid or --label.
Identity Boundaries
| Identifier | Role | Persistent identity? | Normal user vocabulary? |
|---|---|---|---|
LuksUuid | Encrypted-volume identity used for membership correlation, journals, duplicate detection, and live probe checks. | Yes | No |
DiskName | Operator-facing name used in commands, status summaries, mapper suffixes, and labels. | No | Yes |
ByIdPath | Hardware address used to find, open, or format a disk before it is mapped. | No | Setup and repair only |
DiskMember.devid | Prior btrfs binding used when btrfs can report a device by devid but no live LUKS UUID is observable. | Fallback binding only | Repair diagnostics only |
braid-<DiskName> mapper name | Runtime handle passed to cryptsetup, btrfs, mount, and close operations. | No | Mostly hidden |
braid-<DiskName> LUKS label | Human/debug label for LUKS headers and discovery bootstrapping. | No | Mostly hidden |
This means UUID identity does not move normal command vocabulary from names to
UUIDs. Operators still add, replace, remove, and read disks by names such as
toshiba1. UUIDs belong in pool.json, journals, machine-readable status, and
diagnostics where braid must prove that the encrypted member is the expected
one.
Benefits
- Single source of truth.
pool.jsonhas one persistent member identity: the LUKS UUID map key. Disk name, by-id path, and btrfs devid no longer duplicate or compete with a value-sideluks_uuidfield. - Drift-tolerant member correlation. Commands resolve membership by UUID
instead of reconstructing identity from
braid-<name>. A member opened under a drifted mapper can still be recognized as the same disk, and cleanup paths close the observed mapper rather than the expected one. - Safer recovery replay. Journals carry UUID-keyed pre-operation and target membership snapshots. Recovery can compare the live pool against the journaled member set by UUID/devid and re-check live UUIDs before replaying format, add, replace, resize, or close steps.
- Earlier clone and swap detection. Duplicate LUKS UUIDs are rejected before
membership writes or destructive operations, and UUID mismatches catch disks
that were swapped, cloned, or reformatted after the original plan was made.
addandreplacealso re-probe the mounted pool at execution time before writing the journal, so confirmation/passphrase-window races still hit the UUID guard. - Human-facing names stay human-facing. Operators still type and read disk
names such as
toshiba1; mapper names and labels remainbraid-<DiskName>. UUIDs appear where they help diagnostics or machine-readable state, not as the normal command vocabulary. - Present-device probes use live paths. Queries such as lsblk model/serial
and smartctl use the live backing path (
PoolState::underlying_for_uuid), and the TUI disk-detail LUKS metadata dump (cryptsetup luksDump) reads the live backing path for a verified-present (Unlocked) member – not persisted by-id setup/repair handles that can drift while the disk is still present. Metadata for locked or ownership-unverified mappers stays on the by-id handle.
Concrete Improvements
- Membership shape is simpler. Membership has one identity axis: UUID keys map to name/by-id/devid metadata.
- Formatting is crash-replayable. Fresh
addandreplacepaths generate the UUID before mutation, journal it, and pass it to cryptsetup. Recovery can tell whether it is seeing the exact LUKS container that the interrupted plan intended to create. - Cleanup follows observed ownership.
lockclassifies live mappers by UUID/devid and closes the mapper it actually observed. A mapper opened asbraid-WRONGbut owned bydisk1is closed asbraid-WRONG; braid does not merely trybraid-disk1and leave the real mapper open. - Recovery compares member sets by identity. Pending operations carry UUID-keyed pre-operation and target membership snapshots, so recovery can compare live topology with the journaled member set instead of re-discovering by label or assuming names still line up.
- Display code has an explicit join rule. User-facing summaries resolve a
live pool device’s UUID back to
DiskNamefor presentation. UUIDs remain available to verbose/machine-readable paths where they are useful evidence. The TUI Data-tab Bus column is the last display correlation to adopt this rule: its lsblk transport bridge now joins the parent disk’s LUKS UUID to the member name, so transport survives mapper drift like every sibling cell instead of blanking to--.
Runtime Handles And Labels
- Mapper names remain
braid-<DiskName>. - LUKS labels remain
braid-<DiskName>. - Both mapper names and labels are presentation/runtime handles, not identity.
LuksUuidis the only persistent identity for membership decisions.- Code may construct
mapper_name(&member.name)when opening or addressing braid’s expected mapper. - Code must not parse mapper names or LUKS labels to decide membership, target
a member, or correlate live pool state. Narrow exceptions are allowed for
bootstrapping and sanity checks only:
discoverbootstraps from cold braid-labeled disks; returning-disk adoption inaddmay gate on label match after identity correlation still usesLuksUuid/devid/FSID; fresh add and replace recovery may require the expected label before treating an already-formatted target as the crash-created LUKS container, but still requires the journaled UUID to match.lockmay use thebraid-*prefix only to discover cleanup candidates; member identity still requires UUID/devid evidence, and candidates whose backing LUKS UUID cannot be verified are warned and skipped. lockis the special cleanup case: classify live mappers by UUID/devid first, then close the observed mapper name, not a reconstructedmapper_name(&member.name), so drifted-but-member-owned mappers are closed correctly. If mounted per-device probing fails,lockreads the mounted filesystem FSID to key the exclusive-operation preflight (so it will not unmount mid balance/replace), then scans/dev/mapper/braid-*candidates and closes only those with verified backing LUKS UUIDs. The unmount is licensed by mount-point ownership, not an FSID identity match (see Limits And Non-Goals). If anull_underlyingmapper’s persisted devid resolves to multiple membership UUIDs,lockwarns, leaves that mapper open, and marks cleanup uncertain instead of demoting it to orphan cleanup.lockreportsdisk <name>: already closedonly for members the planner has proved absent from every observed live state; it must not reconstructmapper_name(&member.name)during execute to infer absence. If a mapper is skipped because classification failed, or/dev/mappercannot be enumerated in either close-set arm, cleanup is uncertain and lock suppresses all already-closed claims for unobserved members.- Commands that reuse an already-open expected mapper for a requested by-id path must verify the mapper’s canonical backing path before trusting the mapper’s LUKS UUID. A cloned LUKS header can give two physical devices the same UUID, so the runtime proof is backing path match first, then UUID match.
- Recovery must fail closed when a live btrfs device lacks an observable LUKS
UUID and the journal has no persisted devid binding. It must not recover by
inferring identity from
braid-<DiskName>. replacemust re-probe the mounted pool after confirmation and passphrase verification but before sleep inhibitor acquisition, journal write, orbtrfs replace start. If the pool is no longer mounted, the FSID differs from the planned pool, or any live pool device has the replacement target’s LUKS UUID, replace fails closed with the canonical pre-journal validation orDuplicateUuid { scope: LivePool }refusal.
Offline Disk State
A recorded member whose by-id path is present, whose LUKS header is readable,
and whose on-disk LUKS UUID matches the pool.json membership key is identity
verified. If that member is not assembled into the live btrfs pool, status and
TUI surfaces render it as offline, distinct from missing (device absent) and
unknown (braid cannot classify the state).
offline is deliberately cause-neutral. It can describe a locked member in a
degraded mount, an interrupted post-commit mutation, or another state where
membership and live btrfs topology have not yet been reconciled. Because those
causes have different remedies, braid status does not print an Action: hint
for offline rows.
braid doctor’s declared_disks check also surfaces an offline member as a
cause-neutral Warn, never Fail; Fail stays reserved for a live LUKS UUID
mismatch. When the pool is mounted but live topology cannot be probed, doctor
warns rather than claiming every declared member is assembled.
Limits And Non-Goals
- A LUKS UUID identifies an encrypted LUKS container, not a physical drive, enclosure slot, SATA port, or by-id path.
- A cloned LUKS header intentionally has the same UUID as its source. Braid treats that as a duplicate identity and rejects it; it does not invent a new member identity for the clone.
- Mapper and label drift are tolerated for correlation and cleanup, but braid
does not silently rewrite drifted mapper names or labels back into the
expected
braid-<DiskName>form. devidremains btrfs state. It is allowed only as a prior binding for missing/null-underlying cases where btrfs can still identify a member but braid cannot currently observe the LUKS UUID.- A member with neither observable LUKS UUID nor journaled/persisted devid is not recoverable by mapper-name inference. The right behavior is to preserve recovery state and require manual reconciliation.
- UUIDs are not a user-facing naming scheme. They may appear in diagnostics,
pool.json,pending-op.json, and machine-readable output, but command selection and normal summaries should continue to useDiskName. lock’s mounted-fallback teardown unmounts the configured btrfs mount point (licensed by mount-point ownership, not an FSID identity match – braid persists no durable pool FSID to compare a probe against), then scans only/dev/mapper/braid-*and closes by backing LUKS UUID: verified member UUIDs close as members, verified non-memberbraid-*mappers close as orphans; non-braid-*devices and unverified candidates are skipped. The cleanup is scoped by thebraid-*namespace plus UUID, not by which devices backed the unmounted filesystem. Consequence: a foreign btrfs at braid’s mount point would be unmounted (a non-destructive, EBUSY-safeumountwith no-f/-l); a foreign filesystem normally sits on non-braid-*devices, so the realistic consequence is the unmount alone. This is accepted, and gating it would require a durable pool-FSID identity axis this decision deliberately omits to keep membership single-axis.
Tests That Enforce This
cli/src/membership.rsunit tests pin UUID-keyedpool.json, reject stale value-sideluks_uuid, and enforce duplicate checks across UUID, name, by-id, and devid axes.cli/src/types.rsandcli/src/cmd.rsunit tests reject user-supplied--uuid/--labelextras and pin the structuredcryptsetup luksFormat --uuid <uuid> --label <label>argv order.cli/src/status.rsunit tests pin compact status names by resolving live pool UUIDs back toDiskName, including a drifted mapper case.cli/src/status.rsandcli/src/tui/probe.rsunit tests pin that a present, LUKS-identity-verified member absent from the live pool rendersoffline, notmissingorunknown.cli/src/doctor.rsunit tests pin thatdeclared_disksrenders verified members absent from the live pool as cause-neutralWarn, keeps UUID mismatches asFail, preserves offline-pool identity-only behavior, and warns when mounted-pool topology cannot be probed.tests/cli/braid-status-rust.pypins that present disks’ renderedluks_uuidequals the real cryptsetup UUID and thepool.jsonmembership key, and thatnameis the operator name, in intact and degraded states.tests/cli/braid-status-rust.pypins that a degraded mount with one closed verified member renders that member asOFFLINEin human output andofflinein JSON while the pool summary remains degraded.tests/cli/braid-doctor-offline-member.pypins that a degraded mounted pool with one closed verified member makesdeclared_diskswarn with offline wording, while a fully assembled pool and an offline pool remain Ok.tests/cli/status-mapper-drift.pypins thatbraid statusresolves the operator name via the UUID join when a member is open under a drifted mapper (braid-WRONG), not the mapper basename, in both JSON and human output.cli/src/tui/probe.rsunit tests pin the TUI Data-tab Bus column’s transport join to the parent disk’s LUKS UUID, so a member open under a drifted mapper (braid-WRONG) still renders its bus instead of degrading to--.cli/src/tui/probe.rsunit tests pin that the disk-detail LUKS metadata dump reads the live backing path for a verified-present member (surviving by-id drift), and that a foreign / ownership-unverified mapper does not surface the live device’s metadata under the declared disk.cli/src/tui/probe.rsandcli/src/tui/browse/state.rsunit tests pin that the TUI Browse SMART picker resolves a verified-present member through its live backing path (PoolState.disk_underlying, shared with the Data-tab SMART loop) and an offline member through its persisted by-id handle, so the two SMART surfaces cannot disagree under by-id drift.tests/cli/braid-tui-browse.pypins the live/dev/vd*node end-to-end for a present, unlocked member.cli/src/lock.rsunit tests pin the normal UUID/devid-classified close set, observed-mapper closing, UUID-scanned fallback cleanup, orphan warnings for non-member UUID/devid cases, duplicate-devidnull_underlyingskip behavior, and skip warnings for unverified candidates.cli/src/remove.rsunit tests pin all live member devids into the pre-operation journal snapshot before mutation, so recovery has a legitimate fallback binding when LUKS UUID is not observable.cli/src/recover.rsunit tests verify recovery refuses a null-underlying member when the journal lacks both observable UUID and persisted devid, instead of falling back to mapper-name inference.cli/src/enroll_key_file.rsunit tests verify standalone enroll rejects a member whose live LUKS UUID does not match the pool.json membership key before any slot inventory or keyfile mutation runs. Enroll also re-probes each member’s live UUID again at its mutation boundary, after the passphrase prompt and beforeluksAddKey, to catch a disk swapped or reformatted during the prompt window: unit tests pin the standalone re-probe’s mismatch and fail-closed arms and the discovery->execute window closure (a swap that passes discovery is rejected at execute before any keyfile is enrolled or generated).cli/src/replace.rsunit tests verifyReplacePlan::executere-probes the live pool before journal write, rejects unmounted/FSID-drifted/colliding live-pool state, and still proceeds when the fresh probe is clean.cli/src/recover.rsunit tests verify post-maintenance replace recovery re-probes the old mapper UUID before close, skips foreign mappers, and still closes owned active dm mappings without relying on/dev/mapperpath nodes.cli/src/luks.rsandcli/src/probe.rsunit tests verify already-open expected mappers must have the requested backing path before UUID ownership is accepted.tests/cli/luks-mapper-drift.pyverifiesbraid lockcloses the observed drifted mapper owned by a member UUID.tests/cli/luks-lock-skipped-no-false-closed.pyverifies skipped mapper uncertainty does not produce falsealready closedrows.tests/cli/unlock-uuid-mismatch.py,tests/cli/enroll-uuid-mismatch.py, andtests/cli/recover-replace-existing-luks-uuid-mismatch.pyverify swapped or reformatted disks fail UUID re-checks before unsafe replay, slot enrollment, or mount.tests/cli/replace-new-in-pool-guard.pyverifies duplicate LUKS UUIDs are rejected before braid writes membership or calls into btrfs mutation.tests/cli/replace-live-pool-collision-race-rejected.pyverifies replace’s execute-time live-pool re-probe rejects a cloned replacement UUID added to the mounted pool while replace waits for confirmation.tests/cli/braid-add-cloned-luks-header-rejected.pyandtests/cli/replace-cloned-luks-header-rejected.pyverify cloned LUKS headers cannot make add or replace reuse a mapper opened from the wrong physical device.tests/cli/braid-add-persists-before-balance.pyverifies fresh add writes canonical UUID-keyed membership, without a duplicate value-sideluks_uuid, before post-add maintenance continues.tests/cli/braid-doctor-uuid-swap.pyverifiesbraid doctorfails closed when a member’s live LUKS UUID diverges from its pool.json key, surfacing the swap before any mutating command runs.
Consequences
pool.jsonkey order is UUID order, not disk-name order. Display surfaces that need stable operator ordering must sort byDiskName.- Recovery trusts journaled UUID-keyed membership snapshots for phase-specific replay and verifies live UUIDs again at mutation boundaries where a physical disk could have been swapped or reformatted.
- Mapper and label drift no longer break membership correlation, but drifted handles are not silently reconciled back into membership.
- Cloned disks with duplicate LUKS UUIDs are rejected before membership is written.
Rejected Alternatives
- Keep disk name as identity. Disk names are useful for humans but are not intrinsic to the encrypted device. Keeping them as identity preserves the label/mapper drift hazard.
- Use by-id as identity. by-id paths identify hardware slots/devices, not encrypted membership. They can change with enclosures or controller behavior, and they do not detect cloned LUKS headers.
- Use btrfs devid as identity. Devids are live filesystem state and are unavailable before mount. They remain useful only as fallback binding for missing or null-underlying devices.
See
- 017-runtime-disk-membership.md
- ../principles.md
cli/src/membership.rscli/src/journal.rscli/src/recover.rscli/src/lock.rs
Decision: Browse Tab Is Raw Output, Curated Tabs Are First-Class UX
Principles:
Context
The original standalone browse command exposed low-level btrfs command output
in its own TUI. The main braid tui later grew real top-level tabs for curated
pool views such as Data and Scrub. Keeping a separate browse runtime duplicated
input handling, event filtering, command-generation guards, and snapshot tests.
At the same time, not every useful operator view should become a polished TUI panel immediately. Some data is most useful as complete raw command output while braid is still learning which parts deserve a first-class workflow.
Decision
braid tui owns the interactive UI surface. The standalone browse command is
removed, and its raw inspection workflow lives as the Browse top tab.
The top tabs are:
- Data – curated pool and disk health UX.
- Scrub – curated scrub status UX.
- Browse – raw CLI output inspector.
Browse is intentionally low-level and pass-through. It may overlap curated tabs because overlap is not duplication here: Browse answers “what did the underlying tool say?” while curated tabs answer “what should the operator understand or do next?”
Features graduate out of Browse only when they need dedicated interaction, history, safety checks, progress semantics, or domain-specific summaries. Until then, Browse is the holding area for complete command output.
Consequences
- Raw command coverage and parser canaries must exercise Browse through
braid tui, not a separate non-interactive browse command. - The Browse tab can expose Btrfs and NUT commands even when related curated panels already exist.
- New Browse entries should be append-only within the Browse program/command menus unless they are promoted to a curated tab with a separate design reason.
Decision: Pool Lock Is Rust-Owned
Principles:
Context
The shell wrapper originally serialized selected CLI operations by taking
/run/braid-pool.lock before execing the Rust binary. It also performed
post-success lifecycle work such as mount-point permissions and
braid-online.service activation/deactivation.
That split created two sources of truth:
- The wrapper had to know which Rust subcommands mutate pool state.
- Rust had to know which commands read
pool.json, prompt, probe devices, or write recovery journals.
The lists drifted. lock and enroll needed the same early serialization as
other mutators but were not naturally owned by the wrapper’s subcommand case
logic. Wrapper ownership also made it easier for Rust dispatch to grow a
pre-lock state read later, which would violate the stale-state invariant.
Decision
Rust dispatch (cli/src/main.rs) owns pool-operation locking. The
lock_policy function in cli/src/main.rs is the single source of truth for
mapping Commands variants to lock acquisition disciplines. Its wildcard-free
exhaustive match makes every new subcommand choose a discipline at compile
time. For commands whose policy acquires the pool lock, dispatch acquires
/run/braid-pool.lock before loading config, loading membership, probing pool
state, prompting, or writing journals.
The shell wrapper is a pure exec shim. It only sets the module-controlled
PATH and execs the packaged Rust binary.
braid-online.service uses a distinct shutdown entry point:
braid lock --systemd-stop --deadline-secs <n>
The module option braid.lockSystemdStopDeadlineSecs controls <n>. Its
default is 270 seconds, and the module asserts that it is strictly below
braid-online.service TimeoutStopSec (300 seconds).
Lifecycle work also lives under the Rust-held pool lock:
- After every
unlock,add, andrecoverattempt, success or failure, dispatch runsmark_onlineas a finalizer. Theis_mountpointgate insidemark_onlineshort-circuits when the operation failed before mounting; the bootstrap-add and recover cases where the mount succeeded but a later step returnedErrare exactly where this finalizer matters. - Plain
braid lockcallsmark_offlineafter successful unmount/close. - The lock path stops lifecycle-bound scrub units and
BoundBybraid-online.serviceconsumers before unmounting.
Systemd lifecycle synchronization is gated by systemd_lifecycle = true in
runtime config. modules/braid/cli.nix emits that flag for module-managed
installs; standalone CLI configs omit it and therefore skip
braid-online.service, scrub-unit, and BoundBy systemctl calls. The pool
lock and pool_access_group mount-root permission fixups still run outside
that gate.
Snapshot Rule On systemctl start
mark_online snapshots braid-online.service ActiveState at the start of
the pool-lock window. It starts the unit only if the snapshot was inactive or
failed.
It must skip active, activating, and deactivating. The deactivating
case is load-bearing: if a stop job is already running and its ExecStop
needs the pool lock, a new systemctl start braid-online.service would queue
behind that stop. If the caller already holds the pool lock, the queued start
can deadlock against the in-flight stop. Snapshot gating prevents the start
from being queued in that state.
Unknown snapshot results warn instead of starting. The pool remains mounted and usable, but automatic shutdown cleanup may be missing.
Lock Tolerates Missing Or Corrupt Membership
Lock-side dispatch loads pool membership from pool.json only; it consults no
recovery journal. If pool.json is missing, unreadable, corrupt, or fails its
uniqueness checks, lock does not abort – it warns and proceeds with empty
membership. On the live plain-lock and braid-online.service ExecStop paths
the warning goes to stderr; under --dry-run it is folded into the stdout
preview to preserve the single-stream dry-run contract
(ADR 022).
Membership is advisory for lock, not authoritative – its only role here is to
attach friendly member names to status output. What lock closes is decided from
observed state, not from pool.json:
- mappers backing the live mounted pool, proven during the per-device probe by
cryptsetup status+cryptsetup luksUUID; - mounted-pool members whose backing device is gone (
device: (null)), matched by their persisted btrfs device id; - otherwise-stranded
/dev/mapper/braid-*mappers, each confirmed bycryptsetup status+cryptsetup luksUUID(see ADR 024) before it is closed.
With empty membership these mappers classify as unnamed orphans rather than
named members and are still closed. Fallback scanning is limited to
/dev/mapper/braid-*; mounted-pool cleanup closes only the mapper paths
reported by the pool mounted at the configured mount point. A candidate that
fails verification, a /dev/mapper scan that fails, or a duplicate-devid
conflict is skipped with a warning and may leave cleanup incomplete – the
operator resolves it by re-running braid lock or reconciling pool.json.
This closes the failed-bootstrap-add lifecycle hole without a journal. A
bootstrap add can mount the pool and open its LUKS mappers, then fail before
braid writes the first pool.json. If shutdown follows,
braid-online.service ExecStop runs braid lock --systemd-stop, finds no
pool.json, and still unmounts and closes those mappers – because what to
close is read from the live mounted pool and the observed mappers, not from
pool.json.
Lock therefore needs no special case for which operation was interrupted. An
interrupted Remove, RemoveMissing, Replace, or live-pool Add is
reconciled by braid recover against its pending-op.json journal; lock
neither reads nor needs that journal to perform safe shutdown cleanup.
Stop Coordinator + Done Protocol
Plain braid lock acquires /run/braid-stop-coordinator.lock before the pool
lock. After cmd_lock finishes unmounting and closing LUKS, it writes
done\n to that coordinator file and then synchronously stops
braid-online.service.
The recursive ExecStop reentry runs
braid lock --systemd-stop --deadline-secs <n>. If the stop coordinator is
held, the reentry polls for either:
done\n, which means the plain lock already completed the disk cleanup and the reentry can exit 0 immediately.- coordinator release without
done\n, in which case it may proceed to acquire the pool lock and run the cleanup itself. - deadline expiry, in which case it exits 1 before systemd’s
TimeoutStopSeccan kill it.
This protocol replaces the stop-side snapshot gate from ADR 018 for plain
braid lock’s mark_offline. The synchronous stop is intentional: user
invocations should return only after braid-online.service is inactive, while
recursive ExecStop has a deterministic poll-out path instead of queuing
behind itself.
Between writing done\n and stopping braid-online.service, mark_offline
re-checks mountpoint -q and treats a check failure (e.g. OnlineError::Spawn
mid-shutdown) as still-mounted: it warns and skips the stop, leaving
braid-online.service active. The operator can re-run braid lock or
systemctl stop braid-online.service to recover. This mirrors the
“unknown snapshot results warn instead of starting” rule from the
Snapshot Rule On systemctl start
section: when state is unknown, the fail-safe direction is to leave the
lifecycle owner active rather than deactivate over a possibly live pool.
Consequences
- There is a single source of truth for the locked-command list and acquisition
discipline:
lock_policyin Rust dispatch. - The wrapper cannot drift from Rust command semantics because it no longer interprets subcommands.
- Lock acquisition is the first real execution boundary for covered commands. Environment failures such as lock contention happen before recovery journals.
mark_onlinemust keep the start-side snapshot rule to avoid thedeactivatingdeadlock.mark_offlinemust keep the stop coordinator anddone\nprotocol because it deliberately uses synchronoussystemctl stopafter cleanup.- The pool lock is independent from the sleep inhibitor. The lock prevents stale concurrent state reads; the inhibitor still protects only the non-interruptible mutation window described in ADR 019.
Decision: Pin block-group-tree at mkfs time
Context
braid pins its toolchain to nixos-26.05’s btrfs-progs 6.19.1, whose default
mkfs feature set enables block-group-tree. braid requests that one feature
bit explicitly rather than inheriting it from the default, so the on-disk
feature set is determined by braid and not by whichever btrfs-progs the running
toolchain links. (The flag predates the 26.05 bump: under the older
nixos-25.11 btrfs-progs 6.17.1, which did not default block-group-tree, the
same flag made new pools forward-compatible with the 6.19 default.)
This pin is deliberately narrow. mkfs.btrfs still starts from the linked
btrfs-progs default feature set; braid only adds block-group-tree to that
set. The rest of the on-disk feature set continues to track btrfs-progs
defaults.
Decision
cli/src/cmd.rs passes -O block-group-tree on both mkfs.btrfs
invocations: single-disk bootstrap and RAID1 bootstrap. New pools carry the
block-group-tree bit explicitly – matching the btrfs-progs 6.19 default that
braid’s pinned toolchain ships – without freezing any other mkfs default.
The long form is preferred over the bgt alias because it is the documented
primary name and matches the kernel sysfs entry block_group_tree.
Where this is enforced
cli/src/cmd.rs–MkfsBtrfsandMkfsBtrfsRaid1build themkfs.btrfsargv with-O block-group-tree.cli/src/cmd.rs–mkfs_btrfs_single_generates_correct_argvandmkfs_btrfs_raid1_generates_correct_argvassert the exact argv.tests/module/mkfs-block-group-tree.{nix,py}– VM coverage asserts the on-disk feature bit afterbraid addcreates single-disk and RAID1 pools.
Notes
block-group-treeis acompat_rofeature. The kernel rejects unsupportedcompat_robits for read-write mount but may still allow a read-only mount if no log replay is required. The kernel-side feature has been available since 6.1; NixOS 26.05 ships kernel 6.18, so normal braid read-write operation is always supported.- Existing pools created before this pin are unaffected. Offline conversion is
possible via
btrfstune --convert-to-block-group-tree; braid does not wrap that. - Forward-compat note: a rescue boot from very old live media (kernel <6.1)
cannot read-write mount a
block-group-treepool. A read-only mount may still succeed if no log replay is needed. This is not a blocker because braid does not ship rescue media, but the constraint should stay visible.
Decision: Seal the offline pool mountpoint immutable
Context
The pool mountpoint (default /mnt/storage) is a plain directory on the root
filesystem. When the pool is mounted there, writes go to the pool; when it is
NOT mounted, that bare directory is still writable, so any process writing under
the path silently lands data on the ROOT disk. When the pool later mounts over
it, that data is shadowed (invisible), permanently consumes root space, and the
write looked like it succeeded. This is the classic “unmounted mountpoint”
data-safety bug.
braid sets the inode immutable attribute (FS_IMMUTABLE_FL, a.k.a. chattr +i)
on the bare mountpoint directory while it is unmounted:
- Unmounted: a create/write under the directory fails immediately with
EPERM. - A filesystem can still be mounted OVER an immutable directory; once mounted, the mounted filesystem’s own root inode governs writes, so normal pool writes work.
- The attribute is persistent inode metadata (survives unmount and reboot).
- Setting it requires
CAP_LINUX_IMMUTABLE; braid already runs privileged.
braid is the correct owner because the invariant has a hard timing rule:
Only ever set
+iwhen the path is NOT currently a mountpoint. Setting it on a mounted path seals the MOUNTED filesystem’s own root inode – blocking all pool writes and persisting on the pool until cleared.
braid knows the mount state and controls the lifecycle, so it can honor that rule
reliably. A bare tmpfiles chattr +i hack could not: it would seal the live pool
root during a nixos-rebuild switch performed while the pool is mounted. braid’s
unit gates on ConditionPathIsMountPoint=! and the in-CLI fd
STATX_ATTR_MOUNT_ROOT check, so it only ever seals the offline bare dir.
Mechanism (verified against the pinned kernel)
- Mount-over-immutable is allowed. There is no
IS_IMMUTABLEcheck in the kernel mount path (reference/linux/fs/namespace.c); the guard lives only infs/attr.c. So the pool mounts over the sealed dir. +iblocks metadata writes.may_setattr(reference/linux/fs/attr.c) returns-EPERMforchmod/chown/explicit-time changes on an immutable inode – the basis for the tmpfiles interaction below.- The kernel refuses
rmdirof an immutable dir.may_delete->IS_IMMUTABLE->-EPERM(reference/linux/fs/namei.c), so a sealed offline mountpoint cannot be silently removed and recreated mutable while offline. - The fd-based mount-root check uses
statx’sSTATX_ATTR_MOUNT_ROOT, which is authoritative: unlike anst_dev-vs-parent comparison it also detects same-device and bind mountpoints (util-linux’s ownmountpoint.cnotes itsst_devfallback “is … not able to detect bind mounts”).
Decision
1. Always-on (non-configurable)
The seal is an unconditional safety invariant, in the same class as the baked-in
base mount options braid sets unconditionally – noatime
(ADR 015) and skip_balance
(Principles). There is no immutableWhenUnmounted knob.
Rationale: there is no legitimate “off” use case (writing the bare offline
mountpoint is the bug). The escape hatches that matter – graceful degradation
on an unsupported fs / old kernel (Unsupported / MountStateUnknown) and the
braid seal-mountpoint --unseal <path> lever – exist independently of any flag.
Tradeoff: the only capability lost is a declarative, rebuild-time off switch.
Recovery from any unforeseen interaction is the manual --unseal plus the
graceful self-disable, not a NixOS option flip. The always-on default is
reversible later if a concrete need ever surfaces (a knob could be re-added
trivially).
2. Close the boot window
A boot-time seal makes the invariant hold from boot, not only after the first
unlock. A NAS waiting for SSH unlock (auto-unlock off, or USB key absent –
braid-auto-unlock.service exits 0 on skip) otherwise sits offline-and-writable
indefinitely, and a unlock-path seal would never fire because nothing mounts.
3. Seal from the boot/activation unit ONLY
The seal lives in exactly one place: the braid-seal-mountpoint oneshot
(modules/braid/storage.nix). braid add does NOT seal, and neither does the
mount path. This is not a coverage gap – a create-time seal would be a redundant
AlreadyImmutable no-op – for two compounding reasons:
- The oneshot runs on every activation, not just reboot.
braid-seal-mountpoint.serviceisType=oneshotwith noRemainAfterExit, so it returns toinactive (dead)onceExecStartexits (reference/systemd/man/systemd.service.xml). NixOS’sswitch-to-configuration-ngstarts all active targets and systemd re-enqueues theirinactive (dead)Wants=dependencies, so the dead oneshot is started again on everynixos-rebuild switch/testas well as every boot (self-healing). You cannot enable braid or changebraid.mountPointwithout an activation that runs the seal. - The mountpoint is static and pre-exists every pool.
cfg.mountPointis a single fixed path created by the tmpfiles ruled ${cfg.mountPoint}on every boot/activation, so the seal unit seals it (while offline) BEFORE anybraid addcan run. The pool then mounts OVER the already-sealed dir;+ipersists on the underlying inode, and braid’s lock/unmount path neverrmdirs orchmod/chowns the bare dir, so the nextbraid lockreveals it still sealed.
So any pool bootstrapped after braid is enabled inherits an already-sealed
mountpoint, and persistence carries the seal across every later unlock/lock with
no re-seal. The seal is NOT in the create/bootstrap path or the bring-online
mount path; the only seal call outside braid seal-mountpoint is the doctor’s
read-only probe.
The braid-seal-mountpoint unit is ordered before braid-auto-unlock.service.
Both are pulled in by multi-user.target; without the edge they race, and if
auto-unlock won it would mount the pool and the seal unit’s
ConditionPathIsMountPoint=! would then skip the seal. An auto-unlock-with-USB
NAS never boots offline, so without this edge nothing would ever seal the bare
dir. Ordering before auto-unlock runs the seal in the pre-mount window every
boot; auto-unlock then mounts over the sealed dir and persistence carries it.
When autoUnlock is disabled the unit does not exist and before is a harmless
no-op ordering string.
The doctor “offline + mutable -> Warn” check is the detection/self-heal signal
for the rare out-of-band unseal (e.g. a raw chattr -i); the next boot or
activation re-seals.
Static-vs-dynamic mountpoint distinction (Rockstor precedent)
Rockstor (a btrfs NAS) ships create-time sealing – commit
5836560bbd1430c99fc73e3b6408fe3dcfd2220b, “Make top level mount directories
read-only when unmounted. Fixes #1414” – BECAUSE its mountpoints are dynamic
per-object /mnt2/<name> dirs born at creation with no boot-time existence to
seal, and it has no boot re-seal. braid’s single static mountpoint plus an
activation/boot oneshot that fires before any create makes boot-only sufficient
and create-time redundant; braid’s boot re-seal also fixes Rockstor’s fragility
(create-only sealing never recovers from an out-of-band chattr -i).
Rockstor validates the MECHANISM: its bind_mount does mkdir -> chattr +i ->
mount --bind over the sealed dir (mount-over-immutable), and teardown does
chattr -i -> rmdir (the kernel refuses rmdir of an immutable dir – the same
basis as braid’s --unseal lever).
Revisit-if: if braid ever moves away from the single static mountpoint (e.g. per-subvolume mounts at distinct root-fs paths, born on demand like Rockstor’s), create-time sealing becomes necessary and this decision should be revisited.
Maintenance levers
braid seal-mountpoint is a visible command (cli/src/main.rs) with three forms
(cli/src/mountpoint_guard.rs):
braid seal-mountpoint(no args) – the bare boot/internal form. Seals the configuredmount_point. Best-effort: it always exits 0 (a missing/inert guard must not block boot) and is lock-free. This is what the oneshot runs.braid seal-mountpoint <path>– seal an explicit path. Lock-free, but reports an HONEST desired-state exit code: exit 0 iff the path ends up immutable (SetorAlreadyImmutable), non-zero otherwise. This is the remedy for separate-path subvolume mountpoints (below), where a silent best-effort exit 0 would hide an unprotected path the doctor cannot see.braid seal-mountpoint --unseal <path>– clear+ion an explicit path. Unlike the seal forms this is an operator remediation, not a boot action, so it (a) ACQUIRES the pool lock (fail-fast on contention), serializing against an in-flightunlock/lockso a concurrent mount cannot land the pool over a just-cleared bare dir; (b) REFUSES the currently configuredmount_point(the live path must stay sealed while offline); (c) exits non-zero unless the path ends up mutable (ClearedorAlreadyMutable, so a repeat unseal of an orphan reports success).
All three forms route through the same fd-guarded enforce
(cli/src/mountpoint_guard.rs#enforce), which refuses any live mountpoint
(SkippedMounted) via STATX_ATTR_MOUNT_ROOT, so the levers only ever touch an
offline bare dir.
Doctor detection
braid doctor is the sole non-boot detection signal under the boot-only model.
The pure classifier cli/src/doctor.rs#classify_mountpoint_immutability warns
when the pool is offline and the mountpoint is mutable (invariant not yet held –
self-seals on the next boot/activation, or run braid seal-mountpoint), and fails
when the pool is mounted and the inode is immutable (a live pool root was sealed
– a tripwire that should never fire). Both the mount-state and immutability
inputs are tri-state, so a failed probe or an unsupported root suppresses the
finding rather than producing a misleading hint – the seal unit owns the single
“protection unavailable” warning.
Caveats
External writers (intended behavior change)
This is a behavior change for operator-configured services, not a no-op. On a
NAS, services like Samba/NFS exports, Syncthing, Nextcloud, or cron/rsync backups
are routinely wantedBy multi-user.target and will write to /mnt/storage while
the pool is offline (auto-unlock skipped or USB absent, awaiting SSH unlock). With
+i those writes now fail with EPERM. That is the intended win: a loud EPERM
replaces the silent write-to-root that leaked space and got shadowed on mount. An
operator whose backup/share service runs while the pool is offline should expect
the new EPERM.
Sole-mounter / fstab assumption
This invariant assumes braid is the only thing mounting the path. The module
replaced the fileSystems entry, so braid is the sole mounter by design – there
is no fstab entry racing it. If an operator adds their own fstab line or mount
unit for the pool, external mount/unmount can bypass the seal and the invariant
can drift; the doctor check is the detection mechanism.
Reconfiguration (changing mountPoint)
braid seals and checks only the CURRENTLY configured mount_point. If an operator
changes braid.mountPoint (say /mnt/storage -> /srv/storage), the
nixos-rebuild switch that applies the change runs the seal oneshot for the NEW
path during that same activation, so the new path is sealed promptly. braid does
NOT auto-clear the OLD one – the old bare directory keeps its +i until cleared,
so a later rmdir or reuse of the old path fails with EPERM. This is the same
class as any NixOS path option (changing dataDir leaves the old directory
behind); braid does not track prior mountpoints.
Remediation is the explicit-path clear lever (not chattr, which is absent from
the appliance wrapper PATH): braid seal-mountpoint --unseal /mnt/storage. The
old path is offline, so the fd guard clears it safely, and --unseal refuses only
the currently configured mount_point (now /srv/storage), so clearing the OLD,
no-longer-configured path is allowed. The doctor cannot surface the orphaned old
path (without a recorded prior mountpoint it has nothing to probe), so
discoverability is via this doc and the EPERM-on-rmdir symptom, by design.
Separate-path subvolume mounts (not auto-sealed)
The boot seal covers ONLY cfg.mountPoint. braid documents and tests a pattern
(Mounting subvolumes) that mounts
subvolumes at SEPARATE root-fs paths – e.g. /var/lib/jellyfin/media – via
systemd.mounts with bindsTo = braid-online.service. When the pool is offline
those mount units are stopped, leaving bare root-fs directories at those paths,
so an undocumented writer there lands data on root – the identical bug, NOT
covered by the boot oneshot (it seals one static path).
- Subvolumes mounted UNDER the sealed
/mnt/storageare inherently protected by the parent seal and are the safe default. - Subvolumes mounted at separate paths are an advanced, operator-opt-in
pattern. This decision does NOT auto-seal them; it documents the limitation and
points operators at the manual
braid seal-mountpoint <path>lever (whose honest exit codes matter precisely because the doctor cannot see these paths).
The manual lever is honestly half-protective (not self-healing, and the doctor
cannot see these paths). Revisit-if: a fully-declarative
braid.extraSealedMountPoints list that the boot/activation oneshot would seal
alongside cfg.mountPoint (with the same auto-seal + re-seal + doctor coverage).
It is additive – it does not reopen Decision 1’s no-knob stance – but it is a
real new public option with non-trivial scope (a multi-path seal loop, per-path
doctor coverage, and a correctness wrinkle the static pool mountpoint does not
have: a systemd.mounts target dir may not exist until first mount, so an
offline-before-first-mount path reports Absent until created). Deferred until
the manual lever proves insufficient.
Filesystem support
FS_IMMUTABLE_FL is effectively universal on real Linux roots
(btrfs/ext4/xfs/f2fs/tmpfs all implement .fileattr_set). The Unsupported
self-disable realistically fires only on non-NAS roots (vfat/9p/nfs), so it is a
genuine but rare escape hatch, not a central rationale pillar. When it fires the
seal unit emits one clear “root filesystem does not support the immutable
attribute” warning, and the doctor stays quiet (it does not contradict that
signal with an un-actionable reseal hint).
Dry-run / preview
Nothing to integrate. No braid plan-and-execute command seals the mountpoint, so
ADR 022 imposes no obligation here: the seal is an
ambient systemd-unit-managed invariant (the same class as the tmpfiles
d ${cfg.mountPoint} rule), applied by the boot/activation oneshot outside the
plan/preview/execute model.
See
modules/braid/storage.nix– thebraid-seal-mountpointoneshot.cli/src/mountpoint_guard.rs– the guard, the seal site, and the maintenance levers.cli/src/doctor.rs#classify_mountpoint_immutability– the detection signal.- ADR 018: Systemd lifecycle – the unit lifecycle model.
- Mounting subvolumes – the separate-path caveat.
Decision: Release process
Context
braid had no releases: no git tags, the version hardcoded in two places, and the
sole consumer (caja) tracked master HEAD. We want a repeatable
just release patch|minor|major that bumps + tags + publishes, a binary cache so
consumers do not recompile Rust on the NAS, and a “pin to latest release” story.
The hard constraint that shaped the design: the maintainer’s Mac cannot build the
x86_64-linux binary consumers need. The nix-darwin linux-builder advertises
aarch64-linux only; x86 emulation is intentionally omitted. So the x86_64-linux
build and the cache push run in GitHub Actions on the release tag, not locally.
A precondition: the braid repo is public. That makes GitHub-hosted Actions
runners free, lets the github:danneu/braid?ref=release flakeref resolve without
a token, and keeps CACHIX_AUTH_TOKEN unexposed to forks.
Decision
Consumer pin = a moving release branch
The release fast-forwards a release branch to each vX.Y.Z tag’s commit.
Consumers pin braid.url = "...?ref=release"; nix flake update braid is the
“upgrade to newest release” button, and flake.lock still pins the exact rev.
This is the ecosystem convention for a release channel (NixOS/nix
latest-release, cachix latest) and mirrors how a consumer already follows a
nixos-26.05 branch and lets the lockfile pin.
The release branch is machine-owned: only release.yml advances it, and only
to a master-descended commit (enforced by the ancestry guard below). Never commit
to it, and ensure no branch protection blocks the Actions token’s push.
Version single source of truth = cli/Cargo.toml
flake.nix#commonArgs reads pname + version from cli/Cargo.toml via
craneLib.crateNameFromCargoToml (a pure path read, no IFD). braid --version
already reads CARGO_PKG_VERSION from the same manifest via clap. So
cli/Cargo.toml is the only version string in the repo, and cargo release
bumping it is the only version edit.
The invariant is enforced, not merely conventional: the flake check
eval-version-matches-cargo (tests/eval/version-matches-cargo.nix) asserts the
built braid-cli-unwrapped.version equals cli/Cargo.toml’s [package]
version. It is trivially true while the flake reads from the manifest, but fails
loudly if anyone reintroduces a hardcoded flake literal that then drifts.
Version bump = cargo-release; build + publish in CI
just release (Mac-side) runs cargo release from the workspace root, so its
config lives in [workspace.metadata.release] in the root Cargo.toml. Two
independent publish guards: [workspace.metadata.release] publish = false stops
cargo release from touching crates.io, and [package] publish = false in
cli/Cargo.toml makes a direct cargo publish refuse outright. tag-name = "v{{version}}" overrides cargo-release’s workspace-member default
(braid-cli-v{{version}}), because the release-branch FF and gh release flow
assume vX.Y.Z.
Pre-1.0 bumps are plain semver: patch 0.0.1->0.0.2, minor->0.1.0,
major->1.0.0 (so minor’s jump to 0.1.0 is expected, not a surprise). The
in-tree pre-release version is 0.0.0, so the first just release patch cuts
v0.0.1 through the same path as every later release – there is no special-case
bootstrap.
The tag triggers .github/workflows/release.yml, a single sequential job ordered
cheapest-gate-first: ancestry guard -> tag/version guard -> Rust test + version
eval gate -> build x86_64-linux -> push cache -> create GitHub release ->
fast-forward release. Two guards close the trust gap before any build or cache
write: an ancestry guard (git merge-base --is-ancestor) rejects any v*
tag whose commit is not on master, and a tag guard rejects any tag that is
not ^vX.Y.Z$ equal to cli/Cargo.toml’s version at the tagged commit. The
release FF is the last step and the sole consumer-visible “it’s released”
gate: it lands only after the cache is warm and the GitHub release object exists,
so no consumer can nix flake update to a half-published rev. Every step is
idempotent, so a failed run is re-runnable from the Actions UI. The GitHub
release body is rendered by git-cliff, pinned in the .#release devShell and
invoked as nix develop .#release -c git-cliff, from cliff.toml: conventional
commit types are grouped into stable sections such as Features, Bug Fixes,
Documentation, Tests, CI, Build, and Chores, while unmatched commit subjects
land in Other. The first release (v0.0.1) is a one-time exception and publishes
an intentionally blank body instead of a whole-history changelog; later genuinely
empty rendered ranges get the _No notable changes._ placeholder.
Public cachix cache braid
The cache is public; consumers add the substituter https://braid.cachix.org +
its public key and need no auth. release.yml sets skipPush: true on
cachix-action and does one explicit cachix push braid <out>, so only
braid-cli-unwrapped (x86_64-linux) lands in the cache – exactly what the
module default (flake.nix#nixosModules.default, which sets package = braid-cli-unwrapped) consumes. The wrapped braid would duplicate all storage
tools for no consumer benefit.
Behavioral gate is local, not in CI
braid does not run the NixOS VM suite in GitHub Actions, and neither
just release nor release.yml requires a VM result. The release path runs the
Rust tests (just test-rust) and the version-SoT eval check in release.yml on
the tag, then builds and publishes. .github/workflows/test.yml stays
workflow_dispatch-only (its push/pull_request triggers remain disabled).
VM coverage is a manual, per-release choice: when a release warrants it, run the
suite outside the release automation – just test-vm locally, or a
workflow_dispatch run of test.yml. just release keeps a cheap local compile
gate (nix build braid-cli-unwrapped, darwin-native) so a Rust compile break is
caught before the irreversible tag, but it does not gate on a VM run.
This is a deliberate scope choice: the VM suite is slow and runs on the
maintainer’s machine through the linux-builder, and keeping it out of the release
path keeps releases fast and free of VM flakiness, without turning the expensive
VM workflow into a push-triggered CI gate. The tradeoff is that release
behavioral coverage rests on maintainer discipline, not an automated CI gate.
Revisit-if braid gains additional maintainers or the VM suite becomes cheap
enough to run in CI; at that point a master VM gate plus a fail-closed parent
check in just release would make the behavioral gate automatic.
No-follows is the recommended consumer default
This ADR is the authoritative home for the cache-path-identity rationale; ADR 010
points here. The recommended consumer snippet does not set
braid.inputs.nixpkgs.follows. With no follows, braid’s nixpkgs input stays on
its pinned nixos-26.05 – the exact nixpkgs the release cache is built against
– so braid-cli-unwrapped resolves to the cached store path: a cache hit.
Setting follows = "nixpkgs" rebuilds braid-cli-unwrapped against the
consumer’s nixpkgs, producing a different store path and a cache miss (the NAS
recompiles Rust, defeating the cache). follows remains a valid advanced opt-out
(smaller closure via nixpkgs dedup) at the cost of release-cache path identity;
it also moves the pinned tool versions onto the consumer’s nixpkgs (see ADR 010).
This aligns docs with reality: the deployed consumer already runs no-follows with a deliberate “do NOT set follows” tool-version-boundary comment.
Public-repo trust model (every secret-bearing workflow)
Going public widens the threat model for every workflow that consumes a secret, not just the release path:
release.ymlis fork-safe by trigger:push: tagsonly, nopull_request, and forks cannot push tags upstream, soCACHIX_AUTH_TOKENnever reaches a fork.claude.ymlis not trigger-safe – it fires on public issue/comment/review events. It is hardened with a trusted-author gate: each event arm ANDs the@claudetrigger withauthor_associationinOWNER/MEMBER/COLLABORATOR, so a stranger’s@claudecomment never starts the job and cannot spendCLAUDE_CODE_OAUTH_TOKEN.
Any future workflow that consumes a secret must re-clear this bar before it merges.
Risks / gotchas
publish = false(both layers) is mandatory – braid is a private crate that must never reach crates.io.- Dangling tag on CI failure: the tag exists but the
releaseFF (the last step) never runs, soreleasedoes not advance and consumers are unaffected no matter which earlier step failed. Recover by re-running the same workflow (transient/config failures) or by fixingmasterand moving the same version tag to the fixed commit (the runbook has exact commands). - Cache trust on the consumer: skip the public-key step and the consumer reaches the cache but rejects the signature and silently rebuilds from source.
- Master protection vs. the bump push:
cargo releasepushes the bump commit straight tomaster; a required-PR ruleset onmasterthat does not exempt the releaser makesjust releasefail after the local commit/tag. Keep the releaser exempt. - Concurrent releases:
concurrencyserializes release runs (cancel-in-progress: false) andqueue: maxlets up to 100 later tags wait (FIFO by the time each starts waiting on the group), so a burst of tags drops no release. Because that order is wait-start time, not dispatch time, a near-simultaneous burst can still start out of order and fail an older tag’sreleaseFF as a non-fast-forward (benign –releaseonly moves forward – but a red run). The runbook rule stays one active release tag at a time.
See
justfile– thereleaserecipe (Mac-side bump + local gates)..github/workflows/release.yml– CI build, cache push, GitHub release,releaseFF.cliff.toml– git-cliff template + commit-group config for the GitHub release-notes body.tests/eval/version-matches-cargo.nix– the version single-source-of-truth eval guard.- Releasing – the operator runbook.
- Toolchain pinning – no-follows default and parser-critical tool pinning.
Decision: SMART + btrfs error reporting
Context
Before this change, braid status reported btrfs device-error counters per disk
but nothing about SMART. The only SMART signal status surfaced was a global
smartd alert flag. The TUI showed SMART solely as a bare health enum
(ok/warning/failing) in a column, with no way to see why a drive was
degraded. SMART health was computed in parse_smartctl and then discarded –
classify_sata/classify_nvme read the underlying counters (reallocated/
pending/uncorrectable sectors; NVMe media errors, wear, spare) only to collapse
them to a SmartHealth enum.
These are observations from two different layers: the filesystem’s own I/O accounting (btrfs device errors) versus the drive’s self-report (SMART). A degraded drive can show clean btrfs I/O, and a drive with btrfs errors can pass SMART. They should be surfaced as two explicitly-named concepts, not merged behind one vague “Errors” label.
Decision
Two named concepts, not one merged “Errors”. The --json per-disk field
errors renames to btrfs_errors; a new sibling object smart carries the SMART
self-report. The human per-disk block relabels its Errors: line to btrfs: and
gains a parallel SMART: line. (braid is pre-v1.0 with no on-disk-format
backwards-compatibility obligation, so the field rename is a hard break with no
shim.)
smart is a verdict plus evidence, not a flat count. SMART’s authoritative
signal is a pass/fail verdict (health); the counters are supporting evidence
behind it. A single summed smart_errors integer was rejected: it mixes units
(reallocated sectors, wear percent, media errors, spare percent are not
addable) and would render 0 on a drive reporting passed:false – the exact
case where the operator most needs a signal. So health is the headline and the
counters are itemized beneath it.
A protocol discriminator (sata/nvme). The evidence field set differs by
transport (SATA ATA attributes vs the NVMe health-information log), so the smart
object is tagged by protocol to keep the shape unambiguous and
forward-compatible. NVMe is fully implemented, not deferred: media_errors is a
clean headline parallel to SATA reallocated_sectors, and the NVMe spare check is
a threshold pair (available_spare <= available_spare_threshold), not a generic
> 0 rule – a flat numeric rule would misread a healthy available_spare of 100.
One threshold definition feeds all three surfaces. SmartEvidence::fields()
yields each display field as (key, value, is_concern); concerns() is its
is_concern subset. The verdict (Healthy iff concerns().is_empty()), the human
SMART: parenthetical, and the TUI evidence rows (red iff is_concern) all key
off this one structure and a per-field SmartField::label(), so the column
verdict, the human line, and the TUI rows cannot disagree on either the threshold
or the wording.
Column-summary vs detail-evidence split. The TUI disk-table column stays the
bare health verdict (unchanged). The error evidence lives in the per-disk detail
panel as a new SMART section, sibling to the existing btrfs Device Errors
section. celsius ships in the --json smart object but is not shown in the
SMART detail section (it has its own Temp column and is not a verdict input).
status probes smartctl plainly, per disk. Each braid status now spawns
one smartctl -H -A --json per present disk (reusing the command the TUI already
runs). No -n standby guard is needed: status reaches this live SMART probe
only for a mounted pool, and ADR 031 treats mounted
member disks as awake. The future locked-only braid.autoSpinDown does not
overlap this mounted-only probe. The probe is failure-tolerant – any error
collapses to an unknown verdict – so a flaky or absent smartctl never fails
a status build. This affects only the CLI status path, not the monitor daemon.
Per-disk smart is diagnostic evidence only – it does not feed the alert
latch. The “SMART health warning” alert cause stays AlertCause::SmartdAlert,
driven by the smartd daemon’s flag (/var/lib/braid/smartd-alert; see
ADR 014). A live smart.health == "warning" from the new
per-disk probe must never synthesize an AlertCause. So a status report can show
a degraded smart object while alert_active is false – this is intentional,
and is documented so the two SMART signals (the live diagnostic probe vs the
smartd latch) are not conflated. smartd remains the single SMART alert source
because it watches continuously between status runs and applies its own vendor
thresholds; the live probe is a point-in-time diagnostic.
Consequences
- The serialized contract grows:
SmartProbe/SmartHealth/SmartEvidenceare now part of the--jsonsurface. The stable-only smartctl golden fixture is the drift canary on smartmontools bumps (virtio disks emit no SMART, so the live VM canary cannot exercise this path). classify_sata/classify_nvme/classify_healthare removed; their thresholds now live once inSmartEvidence::fields. The verdict is derived from the evidence at a single call site, so the column, the human line, and the TUI detail cannot drift apart.- Every
braid statusnow does one synchronoussmartctlspawn per disk. This is accepted per the mounted-pool drive-wake posture above.
See
Decision: Drive-wake posture
Principle: HDD defaults
Context
braid previously treated all pool disks as potentially asleep at any time. The
TUI reflected that blanket anti-wake posture by keeping the expensive pool probe
manual-only: smartctl -H -A --json per disk plus btrfs state was run only on
startup or r.
That posture was broader than the actual ownership boundary. Today braid does
not park drives with hdparm -S or any equivalent per-drive standby timer.
While the pool is mounted, btrfs and normal NAS access already make member
drives active. A future opt-in braid.autoSpinDown may park drives, but that
feature belongs to the locked state and will own the “do not wake a parked
locked member” rule.
Decision
While the pool filesystem is mounted (PoolStatus::Mounted in the TUI,
pool.mounted in status/doctor), braid treats member disks as awake and
reads/refreshes them freely. The anti-wake concern applies only to the locked
state and is owned by the future opt-in braid.autoSpinDown feature, which will
gate on braid-online.service. braid adds no online-side standby detection.
The automatic read boundary is narrow: the TUI pool auto-refresh loop is the only automatic disk-read loop added here, and it re-arms only while the live pool is mounted. It does not run while the pool is locked, not mounted, or in an error state.
The user-invoked read paths keep their existing semantics:
braid statuslive SMART is mounted-only; it returns the not-mounted status before spawning per-disksmartctl.braid doctorSMART self-test diagnostics may target a member’s persistedby_idpath even when the pool is unmounted or locked.- TUI Browse-tab SMART commands are explicit user actions and may target a
member’s persisted
by_idpath regardless of mount state.
“Online” in this decision means the mounted live pool (PoolStatus::Mounted or
pool.mounted). braid-online.service is a correlated systemd lifecycle marker,
not something these read paths consult; it is the handle the future
braid.autoSpinDown feature will gate on.
Locked-state TUI probes are not claimed to be non-waking. The pre-existing TUI
startup/manual probe builds LUKS state before the mount gate and reads on-disk
LUKS2 metadata with cryptsetup luksDump --dump-json-metadata <by-id> for locked
members. That read can wake a parked drive. This decision leaves that behavior
unchanged and only ensures the new automatic pool loop is mount-gated.
Non-goals
- No
smartctl -n standby. - No
StandbySMART health state. - No
braid.autoSpinDownimplementation. - No
hdparmintegration. - No NixOS module changes.
- No smartd configuration changes.
- No change to
status’s mounted-only SMART probe. - No change to explicit
doctoror Browse-tab SMART reads. - No auto-refresh while locked.
Alternatives considered
Online-side standby detection
Rejected. Adding smartctl -n standby and a Standby health state creates
parser and UI state complexity for a state braid does not create while the pool
is mounted. If an operator has configured an out-of-band standby timer, the cost
of being wrong is a single wake-on-read, matching today’s explicit reads.
Blanket anti-wake posture
Rejected. Keeping every pool probe manual-only forces interactive monitoring to behave like locked-state recovery. The locked state is the only state where braid-managed drive parking belongs, so the locked-state feature should own that rule instead of leaking it into mounted-pool UX.
See
cli/src/tui/app.rs#updatecli/src/tui/probe.rs#probe_pool_for_tuicli/src/status.rs#build_statuscli/src/doctor.rs#check_smart_selftestscli/src/online_state.rs#OnlineStateOps- ADR 015: HDD defaults
- ADR 016: Auto-suspend
- ADR 030: SMART + btrfs error reporting
LUKS Unlock: Research Notes
Reference material for braid’s unlock mechanisms. Covers gotchas, security considerations, and design rationale discovered during implementation.
USB device naming stability
/dev/sdX names are assigned by probe order and shift when devices are
added, removed, or enumerated differently across reboots. A USB stick that
was /dev/sdd can become /dev/sdc if another drive is unplugged.
/dev/disk/by-id/ paths use hardware serial numbers reported by the device
firmware and are stable across reboots and topology changes. Always use
by-id for any persistent reference to a block device.
# Unstable — changes when drives are added/removed:
/dev/sdd
# Stable — tied to hardware serial, survives reboot and topology changes:
/dev/disk/by-id/usb-Kingston_DataTraveler_3.0_E0D55EA573FCF450-0:0
See: Arch Wiki — Persistent block device naming
Passphrase file vs binary keyfile
braid enrolls and opens both the shared passphrase and the auto-unlock keyfile as LUKS keyslot secrets, so cryptsetup stretches both through the keyslot KDF (Argon2id by default for LUKS2). Neither is a raw dm-crypt volume key. The two differ in transport, byte handling, and which slot they occupy – not in whether a KDF runs.
-
Passphrase (slot 0): braid trims a trailing newline and rejects embedded line breaks (
cli/src/luks.rs#finalize_passphrase_bytes), then pipes the bytes to cryptsetup via--key-file=-with no--keyfile-size(a passphrase is variable-length). Designed to protect a low-entropy human-chosen secret. -
Binary keyfile (slot 1): exactly 4096 bytes read via
--keyfile-size 4096, with no newline trimming. braid enforces the exact size before handing the path to cryptsetup (cli/src/luks.rs#validate_user_keyfile_path). High entropy, but still a KDF-protected keyslot secret – not a raw key.
The passphrase and the keyfile are never interchangeable – not even
byte-for-byte identical inputs – for a fundamental reason: each LUKS keyslot
carries its own salt, so slot 0 and slot 1 derive different keys from identical
KDF input. Secondarily, at the cryptsetup level the bytes that reach the KDF
can also differ: a passphrase file containing hunter2\n feeds hunter2 (the
trailing newline is trimmed) while a keyfile of the same bytes feeds hunter2\n
verbatim. That byte example is illustrative only – braid’s keyfile is always
exactly 4096 random bytes (anything else is rejected by
validate_user_keyfile_path), so the literal “same bytes” case never arises in
practice. The claim to reject is that one path skips a KDF; both run it.
A genuinely raw dm-crypt volume key would require --volume-key-file, which
braid forbids: it is in the MANAGED_LUKS_FORMAT_LONG_FLAGS denylist
(cli/src/types.rs), so braid refuses to let it reach luksFormat. The
passphrase-vs-keyfile --keyfile-size argv asymmetry is pinned by the block
comment above the test
cli/src/cmd.rs#cryptsetup_luks_open_omits_keyfile_size.
LUKS2 provides up to 32 keyslots per device; braid uses slot 0 for the passphrase and slot 1 for the keyfile.
See: cryptsetup(8) – key-file processing (the man page’s “passed directly in dm-crypt” / no-digest note is scoped to the plain device type, not LUKS), Arch Wiki – dm-crypt/Device encryption
Keyfile creation target invariant
Any braid command path that creates or overwrites braid.key in a
user-supplied directory must verify that directory exists, is a
directory, and is an active mount point both at plan time and again
immediately before writing braid.key. The plan-time check alone is
insufficient: the seconds-long window between planning and the actual
write (passphrase prompt, Argon2 --test-passphrase verify against
every pool disk, per-disk luksDump slot inventory) lets a USB device
be unmounted (manual umount, hot-unplug, systemd-automount idle
timeout) after the gate passes, which would otherwise let the keyfile
land on the host root filesystem.
This currently applies to braid enroll DIR --generate. Existing-keyfile
consumers may read from ordinary admin-controlled paths and must not require a
mount point:
braid enroll DIRwithout--generatebraid add --enroll DIRbraid replace --enroll DIRbraid unlock --key-file PATHbraid.autoUnlockreading/run/braid-key/mnt/braid.key
Plaintext keyfile exposure (Unraid CVE)
Unraid stores the LUKS passphrase in plaintext at /root/keyfile on
persistent storage. This means anyone with root access or physical access to
the boot drive can read the encryption passphrase — the encryption is
effectively defeated at rest.
See: Unraid forum — LUKS password stored in plaintext at /root/keyfile
Braid avoids this in three ways:
- No local storage. The passphrase file lives on a removable USB device, never copied to the host filesystem.
- Mount-read-unmount. The auto-unlock service mounts the USB read-only, reads the passphrase, then unmounts immediately. The passphrase is not accessible on the filesystem after unlock completes.
- Restricted mount root. The USB is mounted at
/run/braid-key/mnt, under a parent directory/run/braid-keythat remains 0700 root:root. Non-root users cannot traverse the parent regardless of the USB filesystem’s root inode permissions, so the passphrase file stays unreachable during the mount window.
Credential memory hygiene
Passphrase buffers in the CLI are Zeroizing<...> from read to drop
(cli/src/luks.rs::read_line_into_zeroizing,
cli/src/luks.rs::read_file_into_zeroizing), and subprocess delivery is
stdin-only with no argv argument or temporary file. Generated keyfile bytes
are zeroized after write (cli/src/enroll_key_file.rs::generate_key_file).
Passphrases and keyfile bytes never enter the Nix store; the upsmon token is
generated at runtime per decision 020,
and the USB keyfile lives only on the USB stick mounted into
/run/braid-key/mnt/ as hardened in commit df706c44875f.
Boot resilience: nofail + device-timeout
The USB mount uses nofail and x-systemd.device-timeout=Ns. Together
these guarantee the USB device never blocks boot:
nofail: systemd does not treat a failed mount as a boot failure.x-systemd.device-timeout: systemd waits at most N seconds for the block device to appear, then gives up.noauto: the mount is not started at boot; it is triggered on-demand by the automount unit when the auto-unlock service accesses the mount point.
If the USB stick is not plugged in, the automount times out, the auto-unlock service sees no key file, logs an informational message, and exits 0. Boot continues normally; the pool stays locked for manual unlock.
Header backup workflow and messaging
LUKS header backups protect against on-disk header corruption. braid’s add, replace, and enroll_key_file create local .luksheader files at /var/lib/braid/luks-headers/<disk>.luksheader as a transient byproduct – they are not the intended backup target. The product workflow is:
- braid writes a local
.luksheaderduring a header-mutating operation. - The user exports the header off-system (USB, second machine, cloud key storage, etc.).
- The user removes the local copy.
braid statusand the TUI warn while a local copy persists, because its continued presence on the same machine defeats the off-system backup model.
Messaging invariant
User-facing recovery, restoration, and backup-status messages – in doctor, status, unlock errors, the TUI, or any new command – must NOT reference local /var/lib/braid/luks-headers/*.luksheader files. Recovery guidance is generic: “restore from your off-system LUKS header backup if you have one.” Specifically:
- Never branch on whether a local
.luksheaderfile exists. - Never call
Path::existsonpaths.luks_headers_dir().join(...)to change user-visible advice. - Never tell users to run
cryptsetup luksHeaderRestore --header-backup-file /var/lib/braid/....
If doctor pointed users at the local files, the product would be internally inconsistent: status and the TUI warn about the same artifact doctor would tell users to depend on. Generic guidance is the right answer even if the local backup happens to be present and would technically work.
Red flags when reviewing recovery messaging: /var/lib/braid/luks-headers/, .luksheader, luks_headers_dir(), and any Path::exists against a backup path.
Open-failure header diagnosis
Unlock is two-phase. plan_open_pool probes every declared disk and
classifies it (ConfigDiskState); the disks it hands to
execute_unlock_and_mount as to_unlock are exactly the ones it found
PresentLuks – header intact, both luksUuid and luksDump succeeded at
plan time. execute_unlock_and_mount then verifies the credential and opens
each disk.
When verify or open fails, open_disks_with_credential re-probes the header
at failure time and routes the result through explain_open_failure:
Unreadable– emit the off-system-backup guidance (per the messaging invariant above).Ok– the header is intact, so the original cryptsetup/verify error is passed through verbatim (e.g. a genuine wrong passphrase).ProbeFailed– the probe itself could not run, so braid reports that diagnosis is incomplete rather than guessing a cause.
The failure-time re-probe is deliberate, not redundant. Because the
to_unlock disks were PresentLuks by construction, the planner holds no
header-damage observation to thread in – there is nothing to reuse. The
header can still change in the plan->open window (external dd, a hardware
fault, a swapped device), and the failure-time probe is exactly what keeps a
wiped or damaged header from being misdiagnosed as a “wrong passphrase”.
probe_luks_header -> LuksHeaderState is the single header-damage
classifier; ConfigDiskState is a separate, coarse membership gateway, so the
two neither duplicate nor drift.
Unparseable state-file reconciliation
There are two state files that can block normal operation when they are
unparseable: /var/lib/braid/pool.json and
/var/lib/braid/pending-op.json.
For a corrupt or off-schema pool.json, the remediation phrase is:
run 'braid discover --write' to rebuild from existing disks (with all intended pool members attached; see docs/internals/luks-unlock.md)
Confirm the attached disks are the intended pool members, then run
braid discover --write – the corrupt file is overwritten in place and the
original bytes are preserved at pool.json.corrupt-<RFC3339-UTC> next to it.
The snapshot is a hard precondition for the rebuild: if it cannot be written
(full disk, read-only state directory), discover --write refuses with
failed to snapshot existing corrupt file to ... so the corrupt original is
not destroyed; free disk space or fix permissions and retry. The sidecar is
safe to remove once you have manually copied any still-relevant prior-binding
bytes (e.g. devid for a null_underlying member). If you know the expected
member count ahead of time, pass
--expect-count <N> to fail closed against a temporarily detached disk or an
unrelated braid-labeled disk being silently admitted.
Note: braid lock – the user-facing command, the braid-online.service
ExecStop path, and braid lock --dry-run alike – does NOT fail under a
missing or corrupt pool.json. It warns and proceeds with empty membership;
every observed braid-* mapper is then verified by its backing LUKS UUID
before close, so shutdown cleanup stays complete. No lock pathway hard-fails
on an unloadable pool.json.
For a healthy UUID-keyed pool.json, do not run discover --write at all
– use braid add / braid remove / braid replace to mutate membership.
discover --write is a repair tool, not a refresh; running it against a
healthy file refuses (is already a healthy UUID-keyed membership) so it does
not drop persisted devid bindings (decision 024).
For an unparseable pending-operation journal, the remediation phrase is:
Remove /var/lib/braid/pending-op.json after manual reconciliation (see docs/internals/luks-unlock.md) and re-run.
It is safe to remove pending-op.json only when one of these is true:
- The operation has not yet committed any disk-level mutation: no LUKS format
was applied, no
btrfs device addran, and no fresh-format target was opened. - The user has confirmed with
braid statusthat the live pool already reflects the intended state and the journal entry is stale.
It is not safe to remove pending-op.json when a partially completed mutation
is in flight, such as mkfs.btrfs succeeding but btrfs device add not yet
running, or a replace paused mid-rebuild. In those cases, follow the
recovery scenario guide instead.
Replace Target Size Preflight
braid replace mirrors btrfs’s own source-size authority by issuing
BTRFS_IOC_DEV_INFO for the source devid and reading total_bytes, the same
value btrfs replace start compares against. The ioctl is wrapped behind the
BtrfsDevInfo trait so planning code can be unit-tested like the existing
Filesystem boundary; production uses LinuxBtrfsDevInfo with
nix::ioctl_readwrite!.
Target capacity is computed before opening the replacement mapper. Existing
LUKS targets read LUKS2 segment offset and size from
cryptsetup luksDump --dump-json-metadata: dynamic segments use
raw - offset with no sector_size rounding because cryptsetup sizes the
dm-crypt device that way exactly and the kernel rejects, rather than rounds, a
non-sector_size-multiple mapper, so an existing container’s capacity is exact at
any sector_size. Fixed segments use segment.size directly. Fresh targets
instead assume cryptsetup’s default 16 MiB LUKS2 offset, which holds because
braid rejects --sector-size and offset-changing format flags. If any of those
values cannot be read or parsed, or the computed target capacity is smaller
than the source total_bytes, replace refuses before writing pending-op.json,
formatting a fresh target, or opening the replacement mapper.
Failed unlock cleanup
If braid unlock or a recovery mount path opens one or more LUKS mappers
but fails before mounting the pool, braid fails closed for only the mappers
opened by that command invocation.
Cleanup is scoped by the LUKS open helper’s ownership result:
Opened: braid created the mapper during this command and may close it on failure.AlreadyOwned: the mapper was already open at execution time, including races where an operator opened it after planning. braid must not close it.
The cleanup sequence is:
- If any opened mapper path still exists under
/dev/mapper, run scopedbtrfs device scan --forget <paths>for those paths. Failure warns and cleanup continues. - Close every opened mapper with the same retry-on-busy behavior as
braid lock.
When no mapper was opened, cleanup is a silent no-op: there is no
btrfs device scan --forget, no cryptsetup close, and no trailing cleanup
summary. This is the expected wrong-passphrase shape.
After attempting non-empty cleanup, stderr includes one trailing summary line:
- Success:
cleanup: closed LUKS mappers opened by this command. - Failure:
cleanup failed: one or more LUKS mappers opened by this command could not be closed; run 'braid lock' after resolving the issue. First cleanup error: ...
The original unlock or mount error remains the command’s primary error; cleanup output is secondary guidance and never replaces it.
Mount point permissions
Standard guidance for directories containing LUKS key material: the
directory should be mode 0700 owned by root, and keyfiles should be mode
0400. Since braid mounts the USB read-only at /run/braid-key/mnt, file
permissions are whatever the USB filesystem has – but the locked parent
directory /run/braid-key prevents non-root users from traversing to the
mounted files.
See: LUKS key file permissions
Device Disappearance States
When a physical drive disappears from a btrfs pool (hot-unplug, cable failure, drive death), the system passes through several states depending on how far the failure has progressed and whether the LUKS mapper is still open. Each state produces different output from btrfs filesystem show, btrfs device stats, and cryptsetup status — and braid must handle each combination correctly.
This mapping is not derivable from reading braid’s code or btrfs docs alone — it requires cross-tool knowledge that’s easy to get wrong.
State Table
| State | btrfs filesystem show | btrfs device stats | cryptsetup status | braid maps to |
|---|---|---|---|---|
| Healthy | path /dev/mapper/X | [/dev/mapper/X] | device: /dev/sdY | pool.devices |
| Null-underlying | path /dev/mapper/X | [/dev/mapper/X] | device: (null) | pool.null_underlying |
| MISSING with path | path /dev/mapper/X MISSING | [/dev/mapper/X] (??) | not queried | missing_devids only |
| Fully gone | path MISSING | [devid:N] | not queried | missing_devids |
Empirical note: SATA hot-unplug on real hardware enters Null-underlying immediately and stays there for at least 5 minutes without I/O pressure. We have not yet observed the MISSING-with-path state in practice. See real-world/sata-hot-unplug.md for full test results.
Healthy
Normal operation. Physical drive is present, LUKS mapper is open and points to the underlying block device, btrfs sees the device.
Null-underlying
Hot-unplug while mounted. The LUKS mapper (/dev/mapper/braid-X) is still open in device-mapper, but the backing block device has vanished. cryptsetup status reports device: (null). btrfs still sees the mapper path — it doesn’t know the physical drive is gone until I/O fails.
braid handles this correctly: probe_pool detects the (null) device, records it in pool.null_underlying, and monitor includes its devid in alert_missing_devids. The stats row reports both the mapper path and the devid; the alert pipeline pairs by devid directly.
Post-UUID-identity rule: when a mapper is null-underlying, the live LUKS UUID is
not observable from the missing backing device. braid may bind that live mapper
back to membership through persisted DiskMember.devid, but only for this
restricted case. The persisted devid is prior-binding state, not display
authority; status output still uses live btrfs stats for displayed devids.
MISSING with path
btrfs has registered the device as missing, but still remembers which mapper path it had. btrfs filesystem show appends MISSING to the path. The parser puts the devid into missing_devids but discards the path. probe_pool never processes this device (it only iterates show.devices), so it doesn’t appear in pool.devices or pool.null_underlying.
Handling: btrfs device stats rows always carry a mandatory devid field, so the alert pipeline identifies the row by devid regardless of which path string btrfs reports ([/dev/mapper/X] or [devid:N]). The MissingDevice alert is generated independently from missing_devids. Rows for alert-local missing devids are skipped for BtrfsDeviceErrors, while braid ack still snapshots their counters by devid so old counts do not re-alert if the member returns.
The same restricted devid fallback applies to membership correlation: when
btrfs reports a missing device only by devid, braid can resolve the member whose
persisted DiskMember.devid matches. It must not infer membership by parsing a
mapper name or LUKS label.
Uncertainty: We haven’t empirically confirmed which path string btrfs device stats reports for a device in this state – the ?? in the table marks this. The answer no longer affects correctness (devid drives the lookup), but it would still be useful empirical data.
Fully gone
Device is completely absent — either the LUKS mapper was torn down, or the device was missing at mount time (degraded mount). btrfs filesystem show reports bare path MISSING (no mapper path). The pinned btrfs-progs renders the missing-device stats path as [devid:N] (cmds/device.c#print_device_stat_string); [<missing disk>] is an older btrfs rendering. braid does not depend on either string: the parser ignores the device field and keeps the row’s devid and counters.
At this point there is no mapper and no observable LUKS UUID. Mutating commands
that target the missing device, such as remove-missing and missing-path
replace, resolve the requested btrfs devid through UUID-keyed membership and
fail closed if no persisted member carries that devid.
Transitions
The typical progression for a hot-unplug:
Healthy → Null-underlying → MISSING with path(?) → Fully gone
The transitions depend on timing, I/O activity, and whether the kernel tears down the LUKS mapper. A brief unplug-replug might only reach Null-underlying before recovering. A permanent removal eventually reaches Fully gone.
The transition from Null-underlying to MISSING with path is the least understood. It likely happens when btrfs attempts I/O on the device and gets errors, then marks it missing — but the mapper path is still in kernel memory so btrfs remembers it.
Code Pointers
probe_pool:cli/src/probe.rs– buildspool.devices,pool.null_underlying,pool.missing_devidsbtrfs filesystem showparser:cli/src/parse/btrfs_filesystem_show.rs– filters MISSING devices fromdeviceslistbtrfs device statsparser:cli/src/parse/btrfs_device_stats.rs– propagatesdevidas the btrfs-native stats row key and ignores the display-only device string- alert computation:
cli/src/alert.rs–compute_alert_stateandsnapshot_currentkey bydev.devidfrom the parsed stats row;compute_alert_stateskips alert-local missing devids forBtrfsDeviceErrors
smartd alert conditions
Reference for what triggers smartd to call the notification script.
braid’s current smartd config
-a -o on -S on -m <nomailer> -M exec ${smartdAlertScript}
-a expands to: -H -f -t -l error -l selftest -l selfteststs -C 197 -U 198
-o on and -S on are non-monitoring config flags (enable offline testing and attribute autosave on the drive).
Wired in modules/braid/monitor.nix (search for smartdAlertScript).
SATA: conditions that fire the alert script
smartd polls every 30 minutes. Each condition has a SMARTD_FAILTYPE value passed to the script.
| SMARTD_FAILTYPE | Directive | Trigger |
|---|---|---|
Health | -H | Overall SMART health status = FAILING |
Usage | -f | Any Usage (Old_age) attribute value <= vendor threshold |
ErrorCount | -l error | ATA error log count increased since last poll |
SelfTest | -l selftest | New self-test failures detected |
CurrentPendingSector | -C 197 | Non-zero raw value on attr 197 |
OfflineUncorrectableSector | -U 198 | Non-zero raw value on attr 198 |
FailedHealthCheck | -H | SMART health command itself failed |
FailedReadSmartData | Could not read SMART attribute data | |
FailedReadSmartErrorLog | Could not read SMART error log | |
FailedReadSmartSelfTestLog | Could not read self-test log | |
FailedOpenDevice | open() failed – device disappeared | |
Temperature | -W | Temperature >= CRIT threshold (NOT in -a, must be added explicitly) |
SATA: what -a does NOT alert on
These are only logged to syslog, not sent to the script:
- Reallocated_Sector_Ct (5) raw value increases – only alerted if value crosses the vendor threshold (via
-f). To alert on raw value changes, add-R 5!. - Reported_Uncorrect (187), End-to-End_Error (184), Reallocated_Event_Count (196) – same: threshold breach only via
-f, no raw-value alerts. - Temperature – not monitored at all without
-W DIFF,INFO,CRIT. - Prefail/Usage attribute value changes –
-t(=-p -u) logs these to syslog at LOG_INFO, but does not fire the script.
SATA: syslog-only directives (no script trigger)
| Directive | What it monitors |
|---|---|
-p | Prefail attribute value changes (LOG_INFO) |
-u | Usage attribute value changes (LOG_INFO) |
-t | All attribute changes (= -p -u) |
-r ID | Report raw value alongside normalized (informational) |
-R ID (without !) | Track raw value changes (LOG_INFO, no email) |
-R ID! (with !) | Track raw value changes (LOG_CRIT + fires script) |
-l offlinests | Offline Data Collection status changes (LOG_CRIT, no email) |
-l selfteststs | Self-Test execution status changes (LOG_CRIT, no email) |
NVMe: how -a works differently
NVMe has a standardized health model – no vendor-specific attribute IDs or thresholds. The ATA-only parts of -a (-C 197, -U 198, -o on, -S on) are silently ignored.
NVMe conditions that fire the alert script
| SMARTD_FAILTYPE | Directive | Trigger |
|---|---|---|
Health | -H | Critical Warning byte != 0 (any bit set) |
Usage | -f | Percentage Used > 95% or Media and Data Integrity Errors increased |
ErrorCount | -l error | Error Information Log Entries count increased (device-related errors only, since smartmontools 7.4) |
SelfTest | -l selftest | New self-test failures (requires smartmontools 7.5+) |
FailedHealthCheck | -H | SMART health command itself failed |
FailedReadSmartData | Could not read SMART data | |
FailedReadSmartErrorLog | Could not read error log | |
FailedReadSmartSelfTestLog | Could not read self-test log | |
FailedOpenDevice | open() failed – device disappeared |
The Critical Warning byte (-H)
A bitmask where any bit set fires the alert:
| Bit | Meaning |
|---|---|
| 0 | Available spare fallen below threshold |
| 1 | Temperature above/below acceptable range |
| 2 | Reliability degraded (excessive writes beyond warranty) |
| 3 | Media placed in read-only mode |
| 4 | Volatile memory backup (power-loss protection capacitor) failed |
As of smartmontools 7.5, -H MASK (hex) can ignore specific bits, e.g. -H 0xfb ignores bit 2 (reliability/warranty warning).
NVMe syslog-only tracking (no script trigger)
| Directive | What it monitors |
|---|---|
-p | Available Spare changes (LOG_INFO) |
-u | Percentage Used and Media Errors changes (LOG_INFO) |
-t | All of the above (= -p -u) |
-l selfteststs | Self-test execution status changes (LOG_CRIT, no email) |
NVMe vs SATA summary
NVMe monitoring is more straightforward because the spec defines exactly what “unhealthy” means, whereas SATA relies on vendor-specific attribute definitions and generously-set thresholds. The same -a config line works for both – smartd adapts per device type.
SATA attributes worth monitoring
Based on real Seagate SATA output.
Reliable indicators (unambiguous, no vendor-encoding issues)
| ID | Name | Notes |
|---|---|---|
| 5 | Reallocated_Sector_Ct | Sectors remapped due to read errors |
| 184 | End-to-End_Error | Internal data path integrity failure. Non-zero = serious. |
| 187 | Reported_Uncorrect | Uncorrectable errors reported to host. Non-zero = data loss occurred. |
| 196 | Reallocated_Event_Count | Remap operations (complements attr 5). Non-zero = active reallocation. |
| 197 | Current_Pending_Sector | Sectors waiting to be remapped |
| 198 | Offline_Uncorrectable | Sectors unreadable during offline test |
Useful but with caveats
| ID | Name | Notes |
|---|---|---|
| 10 | Spin_Retry_Count | Failed spin-up. Non-zero = mechanical trouble. |
| 188 | Command_Timeout | High values = dying drive, but some timeouts normal during power events. |
Avoid using raw values for comparison
| ID | Name | Why |
|---|---|---|
| 1 | Raw_Read_Error_Rate | Seagate packs composite value (errors in lower bits, total ops in upper). Raw number is meaningless for threshold comparison. Other vendors vary too. |
| 7 | Seek_Error_Rate | Same Seagate composite encoding. |
Not disk errors
| ID | Name | Why |
|---|---|---|
| 191 | G-Sense_Error_Rate | Shock sensor. Low values normal for a moved drive. |
| 193 | Load_Cycle_Count | Wear indicator, not an error. |
| 199 | UDMA_CRC_Error_Count | Almost always a cable/connection problem, not the drive. |
Relationship to braid’s live SMART classifier (SmartEvidence)
braid runs its own live SMART probe: parse_smartctl (in
cli/src/parse/smartctl.rs) builds a SmartEvidence from smartctl -H -A --json output, reading the raw values of 3 ATA attributes:
Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable
(plus the NVMe health-information log on NVMe drives). This verdict now feeds
both braid status and the TUI – the same per-disk probe surfaces in
status output (the smart JSON object and the SMART: text line) and in the
TUI disk-detail panel.
This is complementary to smartd, not a replacement: smartd handles real-time
alerts (with its own set of checks), while braid’s classifier gives
at-a-glance diagnostic status. Critically, the live classifier is diagnostic
only – a degraded SmartEvidence never raises an AlertCause. smartd
remains the sole SMART alert source (it writes the smartd-alert flag that
drives AlertCause::SmartdAlert); see
ADR 014 and
ADR 030. The two
SMART signals don’t need to be identical but should cover the same ground
between them.
SATA Hot-Unplug and Replug Behavior
Empirical observations from physical hardware testing. Validates the device state model in tool-behavior/device-disappearance.md.
Hardware
- Machine: Silverstone NAS (hunk)
- Drives: 3x SATA HDD in btrfs RAID1 over LUKS
- Disk removed: ccc (ST500LM021, devid 3,
wwn-0x5000c500ba0a8b52, LUKS labelbraid-ccc) - OS: NixOS with braid module
Detection signals and latencies
How fast each layer notices the disk is gone, and what passive signals are available without user-initiated I/O.
| Signal | Latency | Passive? | Programmatic detection |
|---|---|---|---|
ata*: SATA link down (kernel journal) | Instant | Yes | journalctl -kf pattern match |
udev remove event | ~11s (after SATA retries) | Yes | udev rule on ACTION=="remove" |
/dev/disk/by-id/wwn-* symlink disappears | ~11s (udev cleans it) | Yes | inotify on /dev/disk/by-id/ |
cryptsetup status shows device: (null) | ~11s | Yes | poll cryptsetup status |
| btrfs write errors (periodic commit) | ~26s | Yes | journalctl -kf pattern match |
btrfs device stats shows nonzero errors | ~26s+ | Needs query | btrfs device stats |
Key takeaway: the kernel journal and udev events are the fastest passive signals. btrfs is completely oblivious until its next periodic commit (~30s default), but then notices on its own without user-initiated I/O.
The udev remove event is especially useful – it includes ID_WWN and ID_FS_LABEL (e.g. braid-ccc), so a udev rule can immediately identify which braid disk disappeared.
What does NOT react
- LUKS mapper (
/dev/mapper/braid-ccc): stays as a zombie.cryptsetup statusstill says “active” but the backingdevice:becomes(null). I/O through it fails. btrfs filesystem show: continues to list all 3 devices with paths and sizes even after errors. Never reports the device as missing from this command alone.
udev remove event (raw)
Arrives after the SATA retries complete (~11s). Includes disk identity:
KERNEL[1395.061297] remove /devices/pci0000:00/0000:00:01.2/0000:02:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sda (block)
ACTION=remove
DEVNAME=/dev/sda
DEVTYPE=disk
UDEV [1395.091944] remove /devices/pci0000:00/0000:00:01.2/0000:02:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sda (block)
ACTION=remove
DEVNAME=/dev/sda
ID_WWN=0x5000c500ba0a8b52
ID_FS_LABEL=braid-ccc
ID_FS_TYPE=crypto_LUKS
DEVLINKS=... /dev/disk/by-id/wwn-0x5000c500ba0a8b52 ... /dev/disk/by-label/braid-ccc ...
cryptsetup status (zombie mapper)
After the block device is gone, the LUKS mapper lingers but its backing device is null:
/dev/mapper/braid-ccc is active and is in use.
type: n/a
cipher: aes-xts-plain64
device: (null)
mode: read/write
btrfs device stats (after errors)
[/dev/mapper/braid-ccc].write_io_errs 10
[/dev/mapper/braid-ccc].read_io_errs 0
[/dev/mapper/braid-ccc].flush_io_errs 1
[/dev/mapper/braid-ccc].corruption_errs 0
[/dev/mapper/braid-ccc].generation_errs 0
Test: SATA Hot-Unplug (disk removed while pool mounted)
Immediate state (seconds after unplug)
| Tool | Output |
|---|---|
btrfs filesystem show | Still lists path /dev/mapper/braid-ccc — no MISSING suffix |
btrfs device stats | Still lists [/dev/mapper/braid-ccc] — not <missing disk> |
cryptsetup status braid-ccc | active and is in use, device: (null) |
braid status | DEGRADED, ccc = missing |
braid monitor | Exit 1 (alert), clean MissingDevice { devid: 3 } |
Conclusion: Immediate hot-unplug enters the null-underlying state. btrfs doesn’t know the device is gone — it still reports the mapper path. Only cryptsetup detects the loss (underlying block device vanished). braid’s null-underlying detection handles this correctly.
State after ~5 minutes (still unplugged)
No change. btrfs filesystem show still reports the path without MISSING. btrfs doesn’t transition to the MISSING state on its own without I/O pressure. The null-underlying state is stable for at least minutes.
Kernel perspective (dmesg)
[ 3431s] ata1: SATA link down (SStatus 0 SControl 300)
[ 3437s] ata1: SATA link down — limiting SATA link speed
[ 3442s] ata1.00: disable device, detaching (SCSI 0:0:0:0)
[ 3442s] sd 0:0:0:0: [sdc] Synchronize Cache failed: DID_BAD_TARGET
Kernel detects the link-down within seconds and detaches the SCSI device. The LUKS mapper (dm-2) stays open — dm-crypt doesn’t tear down when the underlying device vanishes.
Test: SATA Replug (disk reconnected)
State after replug
| Tool | Output |
|---|---|
btrfs filesystem show | Still lists path /dev/mapper/braid-ccc (unchanged) |
btrfs device stats | Still lists [/dev/mapper/braid-ccc] (unchanged) |
cryptsetup status braid-ccc | Still device: (null) — does NOT recover |
braid status | ccc still shows as missing / UNKNOWN |
| Physical device | Back as /dev/sde (was /dev/sdc before unplug) |
Key finding: The LUKS mapper does not recover from null-underlying after replug. The dm-crypt target was /dev/sdc, but the kernel re-attached the disk as /dev/sde. The mapper is permanently broken until closed and reopened.
Kernel perspective (dmesg)
[ 3744s] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 3744s] ata1.00: ATA-8: ST500LM021-1KJ152
[ 3744s] sd 0:0:0:0: [sde] 976773168 512-byte logical blocks
[ 3744s] sd 0:0:0:0: [sde] Attached SCSI disk
Kernel sees the disk on the same ATA port but assigns a new SCSI device node (sde instead of sdc).
Recovery path
The broken LUKS mapper cannot self-heal. Recovery requires:
braid ackto silence the alert- Reboot →
braid unlock(reopens LUKS mappers using stable/dev/disk/by-id/paths)
This is correct behavior — braid uses by-id paths for LUKS open, so a reboot always rebinds to the right device regardless of kernel device node assignment.
Unanswered Questions
- MISSING-with-path state: We never observed
btrfs filesystem showreportpath /dev/mapper/X MISSINGduring these tests. This state may require sustained I/O errors or a degraded mount (reboot with disk missing). The??in the device state table for whatbtrfs device statsreports in this state remains unverified. - Time to MISSING transition: btrfs didn’t transition from null-underlying to MISSING within 5 minutes of idle. It may require write pressure or a longer timeout.
- Replug with same device node: We didn’t test whether cryptsetup recovers if the kernel assigns the same
/dev/sdXpath after replug. Unlikely in practice since the kernel increments device letters.
Validated Code Paths
Changes to these should prompt re-verification of this document:
cli/src/probe.rs–probe_pool()null-underlying detection (lines 190-206)cli/src/monitor.rs– alert-local missing devids union (missing_devids ∪ null_underlyingdevids)cli/src/alert.rs–compute_alert_state/snapshot_current(devid-keyed; no path-to-devid map)cli/src/parse/btrfs_filesystem_show.rs– MISSING device filtering (line 116)cli/src/parse/btrfs_device_stats.rs–devidpropagation and<missing disk>/devid:<n>sentinel handling
btrfs balance: profile conversions and block group types
Block group types
btrfs has three block group types, each with an independent RAID profile:
| Type | Contents | Default (1 device) | Default (2+ devices) |
|---|---|---|---|
| data | File contents | single | single |
| metadata | Inodes, directory entries, extent tree | dup | raid1 |
| system | Chunk tree (maps virtual → physical addresses) | dup | raid1 |
System chunks follow metadata automatically
When converting profiles with btrfs balance start -mconvert=<profile>,
system chunks are converted alongside metadata. You do not need to pass
-sconvert=<profile> separately.
The -s flag exists for converting system chunks independently of metadata,
which requires -f because btrfs considers it dangerous.
This means our standard conversion commands are complete:
# single → RAID1 (after adding 2nd device)
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/storage
# RAID1 → single (before removing last redundant device)
btrfs balance start -dconvert=single -mconvert=dup -f /mnt/storage
Note the asymmetry in the second command: data converts to single, but
metadata converts to dup, not single. A one-device pool keeps two
same-disk copies of metadata (and system chunks), matching what
mkfs.btrfs lays down for a fresh single-device filesystem (see the table
above). -mconvert=single would leave metadata with a single unprotected
copy. The -f is required here because reducing metadata from RAID1 to
dup lowers redundancy – btrfs refuses that without a force flag – not
because of the -s independent-conversion case discussed above.
Why system chunks matter
System chunks contain the chunk tree — the structure that maps virtual addresses to physical device locations. Losing the only copy of a system chunk usually means losing the entire filesystem. On a multi-device pool, having system chunks in RAID1 ensures this map survives a single device failure.
Sources
- Balance — BTRFS documentation
- btrfs-balance(8)
- linux-btrfs: safe/necessary to balance system chunks?
btrfs balance: the soft flag
What soft does
soft is a per-type modifier for convert= filters. From btrfs-progs
Documentation/btrfs-balance.rst (version 6.19.1, tag v6.19.1, commit
fa79dbea32d39ac0ae41a88a079013c7ad2a8a58):
“When doing convert from one profile to another and soft mode is on, chunks that
already have the target profile are left untouched.”
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/storage
Without soft, every block group is rewritten regardless of its current
profile. With soft, only block groups whose profile differs from the target
are touched. The switch is per-type, so -dconvert and -mconvert apply it
independently.
soft keys on the profile tag alone, not on data distribution: a chunk tagged
raid1 is skipped even if both copies happen to live on a subset of the
devices. That distinction is exactly why braid uses hard convert in one place
and soft in another.
Where braid uses hard vs soft
braid issues two different RAID1 convert-balances. The choice of soft is
deliberate in each.
Hard convert – growing the pool (braid add, 3rd+ device)
braid add of a 3rd-or-later device runs a HARD -dconvert=raid1
(pool_balance_raid1, emitting BtrfsBalanceRaid1). Soft would be wrong here:
- Pool has devices A, B – all chunks are raid1 across A and B.
- Add device C.
-dconvert=raid1,soft– every chunk is already raid1, sosoftskips them all. Balance is a no-op.- Device C sits empty. Existing data still has zero copies on C.
A hard rewrite rewrites every chunk, redistributing copies across all three
devices – which is the whole point of balancing after a device add. (A 1->2 add
converts the existing single chunks either way, so the distinction only bites
at the 3rd+ device.)
Soft convert – converting leftover single chunks
btrfs allocates a single chunk (one copy) only when it cannot place two copies
on two devices – i.e. when a RAID1 pool has fewer than two devices present for
allocation. The common case is a 2-disk pool mounted degraded on its one
surviving device: new writes land as single. A larger pool that still has two
survivors keeps allocating raid1 – a 3-disk pool degraded to two creates no
single chunks – so this conversion is only ever needed for chunks written
while fewer than two devices were available.
Once the pool is whole again, those single chunks must be converted back to
raid1 to restore redundancy. braid runs a SOFT -dconvert=raid1,soft
(pool_balance_raid1_soft, emitting BtrfsBalanceRaid1Soft): it converts
exactly the single chunks and skips everything already raid1. Because soft
skips matching chunks, the balance is idempotent and cheap – a near no-op when
there is nothing to convert – so braid runs it as cleanup without first
checking whether any single chunks exist.
braid issues this soft balance from two code paths:
- Live restore –
maybe_restore_raid1(cli/src/pool.rs), invoked byremove-missingand byreplace’s missing path once the operation clears the last missing device. - Recover replay –
replay_owed_raid1_maintenance(cli/src/recover.rs), described below.
replace itself uses btrfs replace start (atomic), not add+balance+remove
(see ADR-001), so this soft balance is the only convert-balance in the replace
path.
Skip – degraded add (missing member present)
braid add into a pool that still has a missing member runs NO convert
balance at all. The post-add present-device count can already be >= 2 (a
2-disk RAID1 with one member missing, plus the fresh disk), which would
otherwise trip the hard convert above; braid gates it off on
missing_count > 0 and surfaces a single [skip] note instead. The skip is
applied symmetrically in cli/src/add.rs: plan_add pushes one
PreviewNote::Skip, and the preview step builder (AddWorkPlan::render_steps)
and the execute balance gate (AddPlan::execute) both carry the same
missing_count == 0 condition so dry-run and real-run agree.
This is a deliberate deferral, not a hazard fix. The hard convert does
succeed on a degraded pool today – btrfs device add works on a degraded
mount and the convert rewrites every chunk across the present devices – but it
rewrites all data through the allocator while the pool has no redundancy, a
longer and less-targeted operation than the purpose-built btrfs replace.
braid instead defers redundancy restoration to the repair step:
remove-missing (which relocates data onto the new disk and runs the soft
balance above) or replace. The soft convert, by contrast, is left running
even on a degraded pool – it only converts single -> raid1 and never
rewrites existing raid1 chunks, so it cannot do a full degraded rewrite and
is safe and beneficial there.
Skipping at add also makes the degraded-add interrupt paths converge. With no hard balance issued, a completed degraded add and every recover path end at the same state: device added, pool still degraded, redundancy deferred to the repair step. Before this change the paths diverged: a completed degraded add restored redundancy via the hard balance, but recover could only safely replay owed RAID1 maintenance when no paused balance survived the interruption. Skipping at add closes that divergent path by making degraded-add recovery end in the same deferred-repair state.
btrfs-progs guidance backs the deferral. btrfs-balance.rst (in Sources)
recommends you “use :command:btrfs replace or :command:btrfs device remove
to handle the failing/missing device first.” We lean on that as general
guidance, not a strong prohibition – its acute warning is narrower, about
converting to a profile with lower redundancy (RAID1 -> SINGLE) with a
present-but-failing device, milder than our convert to raid1 with a
cleanly-missing member.
Recover replay
After a forced shutdown mid-mutation, braid recover replays owed RAID1
maintenance only if btrfs balance status reports no active balance:
Warning
Replaying a crash-paused RAID1 balance can underflow btrfs block-group accounting and silently halve redundancy. recover preserves
pending-op.jsoninstead of automating recovery when the balance state is paused, running, or unknown.
On any pool with two or more devices, the idle/no-paused path runs the soft
balance above to catch single chunks an interrupted balance left behind. The
idempotent ,soft filter makes this safe even when nothing needs converting.
This replay fires for an interrupted add when the balance state is idle – the
new disk is already in the pool, so re-running braid add would refuse, and
recover finishes the job so the operator is not left with single chunks – and
for the idle/no-paused owed post-maintenance step of remove-missing and
replace.
braid remove is deliberately not part of this replay. It is the only mutation
whose pre-mutation phase can issue a balance – the RAID1 -> single conversion
in the 2->1 case. A paused balance found while recovering a remove may be that
unfinished conversion-to-single, not owed RAID1 maintenance, so recover neither
resumes nor soft-replays it. Resuming it would finish converting to single
without removing the device, then clear the journal, silently halving
redundancy. Recover instead directs the operator to re-run braid remove.
Sources
- btrfs-progs
Documentation/btrfs-balance.rst, version6.19.1, tagv6.19.1, commitfa79dbea32d39ac0ae41a88a079013c7ad2a8a58–softfilter semantics. - btrfs-progs
Documentation/btrfs-man5.rst, version6.19.1, tagv6.19.1, commitfa79dbea32d39ac0ae41a88a079013c7ad2a8a58– degraded mounts and mixed block group profiles. - braid: ADR-001 btrfs RAID1 (replacement strategy, add+balance+remove rejected), design principles (degraded restore), and the
replace/remove-missingcommand docs.
ENOSPC vs hang: reproducing btrfs device remove failures in VMs
Background
btrfs device remove missing has two failure modes when surviving devices
lack space for relocation. Both are bad, but the second is catastrophic.
Failure mode 1: instant ENOSPC
Conditions: surviving devices have zero (or near-zero) unallocated space.
btrfs can’t even begin relocating block groups. It fails immediately:
ERROR: error removing device 'missing': No space left on device
Filesystem stays healthy and writable. Annoying but recoverable.
How to reproduce in a VM: 3×512MiB disks, fill to 100% capacity, kill
one disk. btrfs device remove missing fails in under a second.
Failure mode 2: partial relocation → transaction abort → forced read-only
Conditions: surviving devices have SOME unallocated space (hundreds of MiB) but not enough to relocate ALL block groups from the dead device.
btrfs starts relocating, successfully moves some block groups (consuming the free space), then hits ENOSPC mid-transaction on a subsequent block group. The transaction abort forces the entire filesystem read-only:
BTRFS info: relocating block group 4761583616 flags data|raid1
BTRFS info: found 20 extents, stage: move data extents
BTRFS info: found 20 extents, stage: update data pointers
BTRFS info: relocating block group 3419406336 flags metadata|raid1
BTRFS: Transaction aborted (error -28)
BTRFS: error in __btrfs_free_extent: errno=-28 No space left
BTRFS info: forced readonly
The error reported to the user is “Read-only file system” — the ENOSPC is buried in dmesg. The filesystem is destroyed (forced read-only) and requires remounting or rebooting to recover.
On real hardware with slow USB drives, btrfs doesn’t crash quickly — it
spends hours doing I/O, throttled by writeback queuing (wbt_wait), retrying
the same block groups before eventually aborting. In a VM with fast virtual
disks, the same sequence completes in ~40 seconds.
What makes the difference
The variable is whether btrfs can begin relocating:
| Free space on survivors | btrfs behavior | Outcome |
|---|---|---|
| ~0 | Can’t start → instant ENOSPC | Filesystem OK |
| Some but not enough | Starts, partially succeeds, then ENOSPC mid-transaction | Filesystem destroyed (forced read-only) |
| Enough | Completes relocation | Success |
The dangerous middle case is the one that happened in the real incident (3×8GiB USB drives, ~80% full, one died).
How braid avoids this
braid’s mutation preflight refuses these removals — before the pending-op
journal is written — whenever it can prove the survivors lack the space to
absorb the target’s allocations. The degraded failure-mode-2 path is fully
guarded: remove-missing and the 2→1 eviction are fail-closed, so an
operator using braid does not reach the catastrophic path above. The
healthy >=2-survivor case is intentionally warn-and-proceed on an
unprovable check, because it falls through to btrfs’s clean failure mode
1, never the mode-2 abort. Per path:
remove-missing— the degraded failure-mode-2 scenario exactly. Computes RAID1 chunk-pair capacity on the survivors and refuses when it is below the chunks allocated on the missing device. Fail-closed: any probe or parse uncertainty also refuses (cli/src/preflight.rs::check_raid1_relocation_space, wired incli/src/remove_missing.rs).removeevicting to a single survivor (2→1) — RAID1 no longer applies, so braid instead checks the lone survivor can hold the post-conversiondata + 2 × metadata + 2 × system(single + DUP profile). Fail-closed (cli/src/preflight.rs#check_single_survivor_capacity). Enforced at plan time and re-validated as a pre-journal gate incli/src/remove.rs#RemovePlan::execute, closing the drift window where the pool keeps taking writes while the operator idles at the confirmation prompt — an over-committed survivor is then refused before the irreversible-fbalance, still with nopending-op.jsonstranded.removewith >=2 survivors (healthy) — same RAID1 relocation check, but warn-and-proceed on probe/parse uncertainty. A best-effort miss here falls through tobtrfs device remove, which hits the clean failure mode 1 (instant ENOSPC), not the failure-mode-2 abort, so the filesystem stays intact.replaceis not subject to this failure mode.btrfs replacerebuilds onto the new disk instead of relocating onto survivors; its preflight refuses a new disk smaller than the one being replaced (cli/src/preflight.rs::check_replace_target_capacity).
braid status and braid doctor surface a proactive advisory
(cli/src/capacity.rs::enospc_risk_advisory) one disk-loss before a pool
enters this danger zone.
The policy and its rationale are owned by ADR 012’s “ENOSPC pre-flight
check” section (docs/design/decisions/012-intent-cli.md). See also
docs/commands/remove-missing.md and the braid status ENOSPC advisory
(docs/commands/status.md).
Reproducing the hang/crash in a VM
The tricky part is getting btrfs to land in the “some but not enough” zone. Two challenges:
1. btrfs allocates unevenly across devices
Writing 2GiB of data to a 3-device RAID1 pool doesn’t give you ~667MiB allocated per device. btrfs allocates block groups in pairs (for RAID1), and the pair selection isn’t perfectly balanced. In testing:
disk1: Unallocated 1.00MiB ← nearly full
disk2: Unallocated 288.88MiB ← some room
With one device at ~0 free, btrfs can’t relocate anything there → instant ENOSPC (failure mode 1). To get failure mode 2, BOTH survivors need meaningful free space.
2. Block group granularity
btrfs allocates space in block groups (256MiB on small devices, 1GiB on
large ones). A single dd write of 200MiB might or might not trigger a new
block group allocation. Writing in smaller chunks (50MiB) gives btrfs more
allocation decisions, improving the chance of even distribution.
Working recipe (what the test does)
-
Use 4GiB disks — large enough for btrfs to create multiple data block groups per device, giving room for partial relocation.
-
Adaptive fill with small chunks — write 50MiB at a time, check
btrfs device usage --rawafter each write, stop when the minimum unallocated across all online devices drops below 800MiB. This targets the sweet spot: both survivors have 300-800MiB free. -
Use
--rawfor parsing —btrfs device usagedisplays values in human units (MiB, GiB) depending on magnitude.--rawgives bytes, avoiding unit-parsing bugs. -
Kill disk3, mount degraded, attempt
btrfs device remove missing— btrfs starts relocating, succeeds on one block group (~38s of I/O in VM), then crashes on the next with transaction abort.
What didn’t work
-
512MiB disks filled to 100%: instant ENOSPC (failure mode 1). No free space for btrfs to even begin.
-
2GiB disks with 200MiB write chunks: uneven allocation left one survivor with 1MiB free → instant ENOSPC again.
-
2GiB disks with adaptive fill: same uneven allocation problem. Not enough total capacity for btrfs to distribute block groups evenly across 3 device pairs.
-
Parsing
btrfs device usagewithout--raw: values display as MiB or GiB depending on size. On fresh 4GiB disks, unallocated shows as GiB; a regex matching only MiB found zero values → fill loop stopped immediately.
Test files
tests/repro/btrfs-remove-enospc.nix/.py— failure mode 1 (instant ENOSPC, 3×512MiB)tests/repro/btrfs-remove-enospc-crash.nix/.py— failure mode 2 (partial relocation crash, 3×4GiB)
These are repro tests that document actual btrfs behavior, not TDD tests.
They assert the real outcomes: instant ENOSPC with surviving filesystem, or
transaction abort with forced read-only. They invoke raw btrfs device remove missing rather than braid precisely because braid’s preflight (see
“How braid avoids this” above) refuses the operation under these conditions —
reproducing the unguarded btrfs behavior requires bypassing it. They live in
tests/repro/ — a folder reserved for tests that reproduce real-world
scenarios for our records.
LUKS sector size and btrfs
Summary
braid does not pass --sector-size to cryptsetup luksFormat, and it
rejects operator attempts to set it. With the flag omitted, cryptsetup
auto-detects the encryption sector size from each device – and that
auto-detected value is already the optimal one for the device – so braid
never chooses a sector size itself.
What auto-detect picks
When --sector-size is omitted, cryptsetup sizes the LUKS2 encryption
sector to the device’s physical sector:
The encryption sector size is set based on the underlying data device if not specified explicitly. For native 4096-byte physical sector devices, it is set to 4096 bytes. For 4096/512e (4096-byte physical sector size with 512-byte sector emulation), it is set to 4096 bytes. For drives reporting only a 512-byte physical sector size, it is set to 512 bytes.
– cryptsetup 2.8.6,
man/common_options.adoc(LUKSFORMAT branch)
The rule, in short:
- 4Kn (native 4096) and 512e (4096/512e) drives -> 4096-byte LUKS sectors
- drives reporting a 512-byte physical sector -> 512-byte LUKS sectors
On our hardware:
- NAS drives: 8TB+ SATA HDDs (4Kn or 512e) -> 4096-byte LUKS sectors, matching the physical sector.
- Test drives: USB sticks and VM virtio disks report 512-byte
sectors -> 512-byte LUKS sectors. The committed
luksDumpfixtures (cli/tests/fixtures/nixos-26.05/cryptsetup-luks-dump.jsonand itsnixos-unstablemirror) record"sector_size":512for exactly this reason: they capture VM disks, not the NAS hardware.
Why braid doesn’t override it
Two reasons:
-
Auto-detect already yields the optimal value. Setting
--sector-sizeexplicitly could at best re-specify what cryptsetup would pick anyway, while adding a format-time parameter that cannot change without re-encrypting the device. There is nothing to gain. -
An override could make braid’s capacity estimate unsafe. braid rejects
--sector-sizepassed as a--luks-format-argoverride (seecli/src/types.rs#LuksFormatExtraOpts::parse);replacelists it among the on-disk-layout flags it refuses. A non-default sector size can shift the fresh-LUKS payload offset, and braid’s capacity check for a fresh target assumes cryptsetup’s default offset. The scope here is deliberately fresh targets only: the replace-target size preflight covers existing containers, whose capacity is read from the LUKS2 segment and is exact at any sector size.
Aside: even 512-byte LUKS sectors are harmless under btrfs
This section covers the 512-byte LUKS sector case – the test drives
above, and the historical worry that motivated --sector-size 4096 in
the first place. It does not describe the NAS, which gets 4096-byte
LUKS sectors from auto-detect. Even at 512-byte sectors, btrfs sees no
read-modify-write penalty.
The three layers
btrfs (always 4096-byte blocks)
-> LUKS (512 or 4096-byte sectors)
-> physical disk (512 or 4096-byte sectors)
Why –sector-size 4096 exists
Read-modify-write amplification happens at the physical disk when something writes less than a full physical sector. Example: writing a single 512-byte LUKS sector to a 4096-byte-physical-sector disk forces the disk to read 4096 bytes, modify 512, and write 4096 back.
Why btrfs avoids it even at 512-byte LUKS sectors
btrfs never writes anything smaller than 4096 bytes. Take a 4096-byte btrfs write landing on a LUKS device with 512-byte sectors. dm-crypt encrypts that write as 8 x 512-byte crypto sectors internally – but the internal crypto-sector count is not the I/O count:
dm-crypt does not split the write – it allocates one clone bio for the entire write and submits it downstream as a single bio:
clone = crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size);– Linux 6.18.33,
drivers/md/dm-crypt.c(kcryptd_crypt_write_convert)
The physical disk therefore receives a full 4096-byte write – no read-modify-write penalty. The only overhead is CPU: 8 IV computations and 8 smaller AES operations instead of 1. With AES-NI doing multiple GB/s, that is negligible next to spinning-disk speeds.
When –sector-size 4096 would matter
Filesystems that can issue sub-4096 writes: ext4 with 1K blocks, raw
dd, database engines doing 512-byte writes. btrfs is not one of them.
btrfs dev_replace resume-on-mount and the recover relock cycle
Background
A btrfs replace interrupted mid-flight by an unclean crash leaves the
on-disk dev_replace_item in STARTED. On the next mount, the kernel sees
that state and resumes the replace from the on-disk cursor.
For braid, this matters during braid recover: the command may be the first
thing to mount the pool after the crash, so it is also the thing that triggers
the kernel’s resume-on-mount path.
Kernel resume-on-mount behavior
btrfs_resume_dev_replace_async runs as a detached kthread. umount does not
wait for that worker.
The worker commits the post-completion devid swap to disk correctly, but it
does not update the in-memory btrfs_fs_devices for the mount session that
triggered the resume. A probe taken from that session reads stale topology: a
phantom MISSING devid 0 plus both the source and target devices. In the
captured failure, that meant five device entries and braid status reporting
DEGRADED even though every disk was online.
Why the LUKS close+reopen is load-bearing
The important empirical result is narrower than “remount after replace”:
umount + btrfs device scan --forget + remount is not enough if the dm devices
stay alive. That cycle can leave the cached fs_devices attached to the live dm
devices, so the next probe still sees the stale topology from the original
resume-triggering mount.
Only tearing down and recreating the dm devices forces the kernel to re-read the
chunk tree from disk and build a fresh fs_devices that reflects the
post-resume on-disk state.
What braid recover does
Recover splits this into two explicit work actions:
RecoverWorkAction::WaitForKernelReplace and
RecoverWorkAction::RemountCycle.
First, cli/src/recover.rs#wait_for_kernel_replace_to_finish polls
btrfs replace status until the kernel reports Finished or no replace is in
progress. Running is intentionally unbounded because interrupting the kernel
worker would strand the same recovery problem. Suspended or unparseable output
fails closed and preserves the journal.
Then cli/src/recover.rs#relock_and_remount runs the full relock cycle:
umount, btrfs device scan --forget, close the LUKS membership union, reopen
the pool through the standard plan_open_pool flow, and remount through the
standard executor. The second mount sees the completed on-disk replace with a
fresh fs_devices.
Coverage
tests/repro/btrfs-replace-interrupted-mid-flight.py pins the unclean-kill
path end-to-end. It starts a real braid replace, kills the VM mid-flight,
boots again, runs braid recover, and asserts that the resumed replace drains,
pool.json swaps in the new disk, the old disk is evicted, and a later
braid lock; braid unlock cycle stays clean.
Path B: v6.19+ freeze/signal cancellation
The unclean-kill repro does not cover the v6.19+ freeze and signal cancellation
path inside the btrfs replace worker loop. An unclean kernel kill bypasses the
in-loop try_to_freeze and fatal_signal_pending checks entirely.
A sibling repro test is needed when kernel >= 6.19 reaches NixOS stable. Its
sequencing depends on whether braid replace should inhibit suspend for the
operation’s duration; that policy question is orthogonal to the crash path this
note documents.
See also
Development
braid is developed test-first with NixOS VM tests. Tests run on macOS via nix.linux-builder.enable = true in nix-darwin (checks are checks.aarch64-darwin).
Dev shell
Enter the pinned braid dev shell before running local commands:
nix develop
The shell includes the Rust toolchain (cargo, rustc, rustfmt, clippy, rust-analyzer), just, and braid’s parser-critical/runtime tools (btrfs-progs, cryptsetup, util-linux, nut).
That shell is Linux-only – it bundles the storage tools (btrfs-progs, cryptsetup, util-linux, nut), which don’t evaluate on darwin – so nix develop resolves only on a Linux host. On macOS, run VM tests through the linux-builder and build the CLI with nix build .#braid-cli-unwrapped (below); nix develop .#docs works on macOS but carries only the docs toolchain (mdbook).
Test workflow
# run one test while iterating
just test-vm braid-add-disk
# run a few specific tests
just test-vm braid-add-disk braid-remove-disk
# verbose VM logs (only when non-verbose output doesn't explain the failure)
just test-vm braid-add-disk -v
# full suite before finishing
just test-vm
# repro tests only
just test-repro
# all tests including repro
just test-all
# rust unit tests
just test-rust
# parser compatibility canary (CLI parsers against live VM tool output)
just test-parsers
Run tests without -v by default. Only add -v to a specific failing test when the output doesn’t explain the failure. Never run just test-vm -v (all tests verbose) – too much output to be useful.
Unstable lane
Early-warning tests against nixos-unstable. Failures signal upcoming changes, not a contract violation.
# VM tests against nixos-unstable
just test-vm braid-add-disk --unstable
just test-all-unstable
# fixture capture + golden tests against unstable
just capture-all-fixtures-unstable
just test-rust-unstable
Faster tests with tmpfs
VM tests create qcow2 disk images that hammer your SSD. Mount a dedicated tmpfs so builds happen in RAM:
# NixOS config
fileSystems."/tmp-braid" = {
device = "tmpfs";
fsType = "tmpfs";
options = [ "size=16G" "mode=0755" ];
};
just test-vm automatically passes --option build-dir /tmp-braid when the mount exists.
Building the CLI
nix build .#braid-cli-unwrapped
The wrapped .#braid and default put btrfs/luks tooling on PATH and are Linux-only; on macOS build the pure-Rust braid-cli-unwrapped. Rust source lives in cli/.
Upgrading dependencies
braid targets the latest stable NixOS release (nixos-26.05) and uses whatever package versions that channel provides – no custom pins or overlays. Versions are locked to a specific nixpkgs commit in flake.lock.
1. Update nixpkgs
nix flake update
2. Check versions
nix eval --raw nixpkgs#btrfs-progs.version
nix eval --raw nixpkgs#systemd.version
nix eval --raw nixpkgs#autosuspend.version
nix eval --raw nixpkgs#cryptsetup.version
nix eval --raw nixpkgs#util-linux.version
nix eval --raw nixpkgs#smartmontools.version
3. Update vendored reference source
reference/ contains upstream source used for code-level reference (parser behavior, output formats, config schemas). Most entries track flake.lock through nixpkgs-pinned tools; nix-crate tracks Cargo.lock. After a flake update, refresh the nixpkgs-pinned entries to match the new versions:
just fetch-references
4. Refresh fixtures and run tests
A nixpkgs bump can change tool output formats, which breaks parsers. Run the full validation sequence:
just capture-all-fixtures
just test-rust
just test-parsers
just test-vm
5. Update vendored crate sources
After any change that touches the nix line in cli/Cargo.toml, or any cargo update-driven bump to the nix package in Cargo.lock, refresh the crate source:
just fetch-references nix-crate
Releasing
Copy-pasteable runbook for cutting a braid release. The design rationale (why the
release branch is the channel, why the x86_64-linux build runs in CI, why
no-follows is the consumer default, the public-repo trust model) lives in
ADR 029.
Prerequisites (one-time)
- The public Cachix cache
braidexists; you have captured its public key (braid.cachix.org-1:...). CACHIX_AUTH_TOKEN(a push token for that cache) is set as a GitHub Actions repo secret.- The
releasebranch is not branch-protected against the Actions token – CI fast-forwards it withGITHUB_TOKEN. - The releaser can push directly to
master.cargo releasecommits the bump and pushes it tomaster(not via a PR), so any required-PR ruleset onmastermust exempt the releaser, orjust releasefails mid-run after the local commit/tag. - Run from
nix develop .#release(providescargo-release,cargo,gh,juston the Mac; the default devShell is Linux-only and has nocargo).
Before releasing
braid does not run the NixOS VM suite in CI, and neither just release nor
release.yml requires a VM result. VM coverage is a manual, per-release choice:
when a release warrants it, run the suite outside the release automation –
either locally:
just test-vm
or by triggering test.yml manually via workflow_dispatch (its only active
trigger). Do not re-enable test.yml’s push/pull_request triggers, and do
not wire just release to depend on it.
just test-rust (fast, no VM) does gate the release automatically: release.yml
re-runs it on the tag, and just release runs a local compile gate
(nix build braid-cli-unwrapped) before tagging.
Normal release
From nix develop .#release:
just release <patch|minor|major>
This bumps cli/Cargo.toml + the braid-cli entry in Cargo.lock, commits
chore(release): vX.Y.Z, tags vX.Y.Z, and pushes master + the tag. The tag
triggers release.yml. Follow CI:
gh run list --workflow release.yml
gh run watch <run-id>
release.yml builds the x86_64-linux binary, pushes it to the braid cache,
creates the GitHub release, and – last – fast-forwards the release branch (the
consumer channel). Because the FF is last, consumers see the new rev only after
the cache is warm and the release object exists.
Pre-1.0 bumps are plain semver:
| Level | From | To |
|---|---|---|
patch | 0.0.1 | 0.0.2 |
minor | 0.0.1 | 0.1.0 |
major | 0.0.1 | 1.0.0 |
So minor jumps to 0.1.0, not 0.0.x – expected, not a surprise.
Consumers upgrade by bumping the lock to the new release tip:
nix flake update braid # then nixos-rebuild switch
(A consumer may wrap this in a shortcut, e.g. a braid:upgrade shell function.)
One active release tag at a time. release.yml sets queue: max, so a burst
of tags all queue (up to 100, FIFO by the time each starts waiting on the
concurrency group) and none is dropped. But that order is wait-start time, not
dispatch time, so pushing the next tag before the prior release.yml run finishes
risks two tags starting out of dispatch order – the older one’s release
fast-forward then fails as a non-fast-forward. That outcome is benign for
consumers (release only ever moves forward) but shows a red run. So push (or
just release) one tag at a time.
Release notes
The GitHub release body is generated by git-cliff from commit subjects, grouped
by conventional-commit type (config in cliff.toml). Named types render into
stable sections such as Features, Bug Fixes, Documentation, Tests, CI, Build,
and Chores; anything unmatched lands in Other. The first release (v0.0.1) is a
one-time exception and intentionally has a blank release body; later genuinely
empty rendered ranges get a _No notable changes._ placeholder.
Preview the next release’s notes before tagging:
just changelog
(renders commits since the last tag). Before the first v* tag exists, this
prints nothing to match the blank v0.0.1 release body. Editing a release body
never affects consumers – the release branch fast-forward is what publishes.
The first release
The first release is not special: it is just release patch, the same flow as
every later release. The in-tree version is the pre-release 0.0.0, so the first
just release patch cuts v0.0.1 (0.0.0 -> 0.0.1); all later runs bump from
0.0.1.
Two first-run-only things happen for free, with no extra steps:
release.yml’s finalgit push origin <commit>:refs/heads/releasecreates thereleasebranch (the ref does not exist yet, so the first push makes it), andgh release createcuts the first GitHub release (no pre-existing release required). Thev0.0.1release body is intentionally blank instead of a whole-history changelog; git-cliff notes begin with later releases.
Because CI has no VM gate, run the behavioral suite locally before this first cut:
just test-vm
just test-rust
If release CI fails
First rule: never re-run just release after a tag exists – that would bump
again.
-
Transient or config-only failure – re-run the existing workflow:
gh run rerun <run-id> gh run watch <run-id> -
Bad tagged code – fix
master, then move the same version tag to the fixed commit:git push origin :refs/tags/vX.Y.Z git tag -d vX.Y.Z git tag -a vX.Y.Z -m vX.Y.Z git push origin vX.Y.Z
Why this is safe: the release fast-forward is the last step, so until it runs
release has not advanced and consumers cannot nix flake update to the new rev
– a failure at any earlier step (test, build, cache, or gh release create)
leaves consumers untouched. Re-running converges: the cache push and
gh release create are idempotent, and the FF re-pushes the same commit.
Testing notes
Test conventions and NixOS VM test framework reference for braid. The short three-bullet preamble contract (Intent / Why it exists / Scenario) lives in AGENTS.md at the repo root; everything else – the literal preamble form, the flake.nix registration rule, framework gotchas, and patterns – is here. For the lifecycle test suite see tests/module/systemd-lifecycle.py. For the Rust-level TUI view snapshot tests (insta-based, run via just test-rust), see tui-snapshots.md.
Conventions
Preamble: literal // line-comment form
Every test’s preamble is a contiguous block of // line comments directly above the test item.
- Intent — what behavior this test verifies (or tries to verify)
- Why it exists — what risk/regression this protects against
- Scenario — the real-world user/system story this models, especially the concrete bug or incident that inspired the test
#![allow(unused)]
fn main() {
// Intent: one-line statement of the behavior verified.
// Why it exists: the regression risk this protects against, ideally with
// reference to the incident or commit that prompted it.
// Scenario: the concrete real-world sequence the test models.
#[test]
fn the_test() { ... }
}
New VM tests must register in flake.nix
just test-vm and just test-all build whatever is registered under checks.<system> in flake.nix – there is no default per-test list in the justfile. When adding a new tests/cli/*.nix or tests/module/*.nix, also add a matching pkgs.testers.nixosTest (import ./tests/cli/<name>.nix { braid = linuxCrane.braid; }) entry to flake.nix. An unregistered test sits in the tree but never runs under nix flake check.
VM-test framework gotchas
just test-repro requires the full repro- prefix
just test-repro <name> and just test-vm <name> pass the test name verbatim to nix as a final attribute selector. The reproChecks flake output is built by filterAttrs keeping the repro- prefix in the filtered set, so the attribute name passed to just test-repro must be exactly the name in flake.nix, prefix and all.
# correct
just test-repro repro-btrfs-replace-interrupted-mid-flight
# wrong -- fails with "flake ... does not provide attribute ... reproChecks.aarch64-darwin.btrfs-replace-interrupted-mid-flight"
just test-repro btrfs-replace-interrupted-mid-flight
The test-vm checks set strips entries with the repro- prefix, so test-vm test names do not have a prefix (e.g. cli-recover-replace-completed).
NixOS test driver wraps every command with set -euo pipefail
The driver auto-prepends set -euo pipefail to every machine.succeed / machine.execute command before sending it to the VM. This is invisible from the test script but has real consequences for chained commands.
Symptom: A chain like ... ; wait $pid_loser ; echo $? > /tmp/exit-a ; ... silently aborts when wait returns non-zero. The exit-code file is never written, and the next subtest assertion fails with cat: /tmp/exit-a: No such file or directory – pointing at the wrong layer.
Idiom for capturing a non-zero exit without aborting:
ec_a=0 ; wait $pid_a || ec_a=$? ; echo $ec_a > /tmp/exit-a
The || consumes the non-zero into the variable, so errexit does not fire. Works for any command whose non-zero exit is expected (wait, grep, diff, etc.). This matters most in concurrent-process tests where one process is expected to exit non-zero (fail-fast lock contention, expected error paths).
Python f-strings without placeholders fail the build-time linter
NixOS VM test scripts are linted at build time. f-strings without {placeholder} variables (e.g. f"Missing foo in config") cause a build failure: f-string is missing placeholders.
In tests/**/*.py, never use f"..." without at least one {variable} inside. Use "literal" + variable for assertion messages that include dynamic values.
Patterns
Regression test quality
Regression tests must fail when the bug is reintroduced. Test the layer where production failed, not a downstream parser or helper that only proves later code works when given correct input.
For error propagation, assert the typed variant and payload. Use exact rendered
strings only for tests whose purpose is to lock Display or user-facing
output. If a change reclassifies an error, production and tests should call the
same mapping helper; do not hand-build the target variant in the test.
For user-visible CLI output or control-flow bugs, prefer a CLI/VM test that
drives the real command. If stdout vs stderr matters, capture them separately
with >stdout 2>stderr; merged streams do not pin routing. Render or preview
helpers that form a user-visible boundary need exact-output coverage for every
branch, including no-op branches.
Keep repro tests focused. If adjacent behavior already has dedicated coverage, cite that test instead of bundling another phase into a repro whose failure would become ambiguous.
When a dead test has a name that points at a real user-visible contract, replace it with a real regression test by default. Deleting the dead test turns bad coverage into no coverage.
Live-tool behavior locks
When braid code is changed to depend on a specific external-tool behavior – a particular exit code, a particular output wording, a particular return-value path – mocked unit tests prove the classifier is correct given the assumed behavior, but they do NOT prove the tool still behaves that way. A nixpkgs bump that changed cryptsetup’s exit-code contract would silently misclassify in production while every mocked test still passed.
Whenever a plan introduces a classifier of the form exit_code == <N> or stderr.contains("<wording>") against an external tool, identify (or add) a live-tool repro/VM test that asserts the same code/wording directly. List that test in the plan’s verification section as a required gate. If the live-tool test would be non-trivial to add, pause and reconsider whether the classifier is actually robust.
This is the same family as braid’s parser-compatibility lanes (just test-parsers, just test-rust-unstable, see parser compatibility) – those lock the parser against tool-output drift; a behavior-lock test locks an exit-code or wording classifier against the same drift surface. Reference example: tests/repro/cryptsetup-close-mounted.py asserts exit_code == 5 for busy-close and exit_code == 4 for already-closed, behavior-locking the assumption that cli/src/lock.rs retry classifier depends on.
VM and command test design
Before inventing VM setup for missing disks, degraded mounts, ENOSPC, hotplug,
or similar storage state, search tests/cli/, tests/repro/, and tests/hw/
for an existing pattern and reuse it where it fits.
Before proposing a VM test for a mutating command, search the same area for existing notes that say a shape is infeasible, and read sibling tests to learn which seams already exist.
For ordering invariants like “persist state before post-operation maintenance”, prefer a deterministic command-layer failure-injection test: allow the persistence step to succeed, force the next maintenance step to fail, then assert the persisted state is current and the journal still exists.
When code touches kernel async workers, mount-session caches, or device-layer teardown, mocked unit tests are not enough. Run the relevant VM or repro test, inspect full logs when it fails, and repeat timing-sensitive repros enough to rule out a lucky pass.
For cmd_* boolean gates derived from multiple inputs, route both branches
through the same injected seam and test the matrix cells that distinguish the
intended gate from plausible wrong gates.
For one-off sequenced or stateful command-test behavior, prefer a file-local
runner or wrapper over widening the shared MockRunner. Reserve shared runner
API changes for behavior that many tests need.
When removing sleep wall-time from tests, inject a sleeper dependency. Do not
use #[cfg(test)] to zero a production timing constant whose value is part of
the behavior.
Eval-time test isolation: disable, don’t stub
When an eval-time test (lib.evalModules in isolation) breaks because of a new NixOS option dependency, disable the unrelated feature in the test config rather than expanding the fake module surface with stubs.
Stubbing options (e.g. adding options.users) makes the test less isolated and can mask future accidental dependencies on unrelated NixOS top-level options. Disabling the feature that introduced the dependency keeps the test focused.
When fixing eval-time test failures caused by new module dependencies, first check if the dependency comes from a feature the test doesn’t need. If so, set that feature’s config to its “off” value (e.g. poolAccessGroup = null) instead of adding option stubs.
Parser compatibility
braid parses output from btrfs-progs, cryptsetup, util-linux, smartmontools, NUT, and ethtool. These parsers can break when tool versions change. Two validation lanes exist:
Stable lane (pinned contract)
just test-parsers— CLI parser canary. Exercises CLI-reachable parsers against live tool output in VMs (includingbraid-status-ups, the NUT canary).just test-rust— validates golden fixtures for the full parser set, includingparse_upsc. Fixture-backed coverage stays current only after runningjust capture-all-fixtureswhen parser-critical tool versions change (e.g. nixpkgs bump).- Fixture refresh is a separate obligation:
just test-parserspassing does not guarantee TUI-only parsers (parse_lsblk_json,parse_cryptsetup_luks_dump) or unused parsers (parse_btrfs_scrub_status_per_device) are compatible with the current toolchain. parse_smartctl(the SMART health parser) is reachable from both the TUI and thebraid statusCLI command, so it is no longer TUI-only. It is still not covered by the live VM canary, though: virtio disks emit no usable SMART, sojust test-parserscannot exercise it. Its drift canary is the stable-only smartctl golden fixture (see the smartctl-fixtures note below).- Fixtures in
cli/tests/fixtures/nixos-26.05/are committed and authoritative. NUT fixtures live incli/tests/fixtures/nixos-26.05/upsc/(and the unstable mirror); they are produced byjust capture-ups-fixtures, which boots a dedicated NUT VM with per-statedummy-upsdrivers (seetests/capture-ups-fixtures.nix). - smartctl fixtures are stable-only by design. VM virtio disks do
not emit useful SMART data, so
just capture-all-fixturesdoes not regeneratesmartctl-sata-with-temperature.jsonorsmartctl-selftest-*.json.smartctl-sata-with-temperature.jsonis a one-time physical-drive capture;smartctl-selftest-*.jsonfixtures are hand-authored (seecli/tests/fixtures/nixos-26.05/README.md). Thetool-versionsVM test checks thatsmartctlresolves to a/nix/store/path on the VM’s PATH and that its self-reported version matchespkgs.smartmontools.version, but it does not detect nixpkgs version bumps because both sides advance together. On any nixpkgs bump that touches smartmontools, manually review and refreshsmartctl-selftest-*.jsonagainst the newata_smart_self_test_log.standardJSON shape andsmartctl-sata-with-temperature.jsonagainst the new health/temperature JSON shape (smart_status,temperature,ata_smart_attributes). - ethtool WoL fixtures are hand-authored / no-live-capture. VM
virtio NICs do not emit useful Wake-on-LAN data, so
just capture-all-fixturesdoes not regenerate ethtool output. The doctorwake_on_lanparser is covered by hand-authored Rust unit fixtures, and wrapper provenance is covered by the override-based VM tests intool-versionsandbraid-auto-suspend.
Parser-critical tool versions are the pinned nixpkgs versions of btrfs-progs, cryptsetup, util-linux, nut, smartmontools, and ethtool. Treat any change to the nixpkgs node in flake.lock, any flake.nix change that alters the nixpkgs input, or any change to braid.packages.{btrfsProgs,cryptsetup,utilLinux,nut,smartmontools,ethtool} as a required fixture-refresh event.
When parser-critical tool versions change, run:
just capture-all-fixturesjust test-rustjust test-parsers
Unstable lane (tracked forecast)
Early-warning lane for upstream parser/output drift. Unstable failures signal upcoming changes, not a contract violation. Fixtures in cli/tests/fixtures/nixos-unstable/ are committed so upstream output changes are visible in git history, but they are non-authoritative.
just test-all-unstable– VM tests against nixos-unstable. Covers CLI-reachable parsers against live tool output but does not cover the full parser surface (TUI-only parsers, unused parsers, smartctl).just capture-all-fixtures-unstable+just test-rust-unstable– covers btrfs/cryptsetup/util-linux/NUT against unstable tool output via golden fixtures. Missing fixtures fail (not skip).- smartctl and ethtool have no unstable fixtures. Unstable capture/test coverage intentionally covers btrfs/cryptsetup/util-linux/NUT only; see the Stable lane for why smartctl fixtures are stable-only and how to refresh them on smartmontools bumps, and why ethtool WoL output is hand-authored instead of live-captured.
Full unstable canary workflow:
just test-all-unstablejust capture-all-fixtures-unstablejust test-rust-unstable
Reference source
Before searching the web for tool behavior, consult local resources first. reference/ contains shallow clones of upstream repos at the versions pinned in nixpkgs, plus Rust crate sources pinned in Cargo.lock. Refresh with just fetch-references.
When to look: Any time you’re implementing, modifying, or debugging code that interacts with these tools — especially parsers. Read the relevant source before making assumptions about output format or behavior.
- btrfs-progs — kdave/btrfs-progs
- Source:
reference/btrfs-progs/cmds/— one file per subcommand (e.g.cmds/scrub.c). Parser output formats, exit codes. - Docs:
reference/btrfs-progs/Documentation/— RST. See btrfs docs below for the topic table.
- Source:
- systemd — systemd/systemd
- Source:
reference/systemd/src/— unit lifecycle internals,systemd-ask-password, mount/automount. - Docs:
reference/systemd/docs/— markdown design docs (BOOT.md,INHIBITOR_LOCKS.md,MOUNT_REQUIREMENTS.md,CREDENTIALS.md,PASSWORD_AGENTS.md, etc.).reference/systemd/man/— XML man-page sources for unit/option reference (systemd.service.xml,systemd.mount.xml, …).
- Source:
- autosuspend — languitar/autosuspend
- Source:
reference/autosuspend/src/— check classes, config schema, wakeup scheduling. - Docs:
reference/autosuspend/doc/source/— RST (available_checks.rst,available_wakeups.rst,configuration_file.rst,systemd_integration.rst).
- Source:
- cryptsetup — cryptsetup/cryptsetup
- Source:
reference/cryptsetup/src/(CLI),reference/cryptsetup/lib/(libcryptsetup) —luksDumpoutput, LUKS2 header structure, keyslot operations. - Docs:
reference/cryptsetup/man/—*.8.adocman pages (cryptsetup-luksDump.8.adoc,cryptsetup-open.8.adoc, …).reference/cryptsetup/docs/— design notes includingLUKS2-locking.txtandon-disk-format-luks2.pdf.
- Source:
- util-linux — util-linux/util-linux
- Source:
reference/util-linux/misc-utils/(lsblk,blkid),reference/util-linux/sys-utils/(mount,umount),reference/util-linux/libmount/,reference/util-linux/libblkid/—lsblkJSON schema,blkidoutput, mount/unmount behavior. - Docs: Man pages live next to source as
*.8.adoc(e.g.misc-utils/lsblk.8.adoc,sys-utils/mount.8.adoc).reference/util-linux/Documentation/is project meta (build/test/contribution notes), not user reference.
- Source:
- smartmontools — smartmontools/smartmontools
- Source:
reference/smartmontools/smartmontools/— flat layout.smartctloutput format, SMART attribute definitions, exit codes. - Docs: No separate docs dir. Man-page sources are inline alongside the code:
smartctl.8.in,smartd.8.in,smartd.conf.5.in.
- Source:
- hddfancontrol — desbma/hddfancontrol
- Source:
reference/hddfancontrol/src/— Rust daemon.device/(drivetemp, hddtemp, smartctl probing),probe/(pwm-test ramp logic),fan.rs(PWM control),pwm.rs(sysfs PWM I/O),cl.rs(CLI args). - Docs: No separate docs dir.
reference/hddfancontrol/README.mdandreference/hddfancontrol/systemd/hddfancontrol.service— the upstream unit we intentionally don’t use (seemodules/braid/fan-control.nix).
- Source:
- nut — networkupstools/nut
- Source:
reference/nut/clients/(upsmon.c– shutdown-on-LB daemon,upsc.c– status query,upscmd.c,upssched.c,upsrw.c),reference/nut/server/(upsd.cand net protocol handlers),reference/nut/drivers/(usbhid-ups.cand per-vendor*-hid.cfor the USB HID path v1 targets). - Config schema:
reference/nut/conf/— sample files (nut.conf.sample,ups.conf.sample,upsd.conf.sample,upsd.users.sample,upsmon.conf.sample.in,upssched.conf.sample.in). Authoritative for fields braid generates into/etc/nut/*. - Docs:
reference/nut/docs/man/—*.txtasciidoc man pages for daemons, drivers, and config files.reference/nut/docs/— design notes (design.txt,net-protocol.txt,developer-guide.txt,new-drivers.txt,FAQ.txt).
- Source:
- linux — torvalds/linux
- Source:
reference/linux/— kernel source at the exact version pinned in nixpkgs. Look infs/btrfs/for btrfs-specific I/O scheduling, raid handling, and read balancing logic.drivers/md/for raid and block layer behavior. - Use for: Understanding kernel-level I/O behavior, raid1 read balancing, mount semantics, block device management.
- Source:
- coreutils — coreutils/coreutils (GitHub mirror of GNU Coreutils)
- Source:
reference/coreutils/src/— one C file per utility (e.g.src/timeout.c,src/realpath.c,src/stat.c,src/chmod.c,src/chown.c,src/head.c,src/base64.c). Read these to confirm what each helper actually guarantees – e.g.timeout(1)exit-code semantics and signal forwarding live insrc/timeout.c, not in any manpage. - Docs:
reference/coreutils/doc/coreutils.texi— the canonical reference manual (per-utility sections inside one big Texinfo file). Per-utility manpage stubs live inreference/coreutils/man/as*.x(e.g.man/timeout.x); these are short prologues that get merged with--helpoutput byhelp2manat build time, so the full prose is incoreutils.texi. - Use for: Any time braid code or a plan reasons about a Coreutils helper’s behavior beyond the obvious — exit codes, signal handling, race windows,
--helptext, edge cases. Especiallytimeout(1):timeoutcannot bound an uninterruptible kernel wait, and the proof is insrc/timeout.c’s use ofkill()against a userspace child.
- Source:
- nix (Rust crate) – nix-rust/nix
- Source:
reference/nix-crate/src/– Rust crate at the version pinned inCargo.lock, notflake.lock.unistd.rs(User/Group/chown/exec helpers, fd ownership types),fcntl.rs(open,flock,OFlag),errno.rs(Errno),sys/stat.rs(Mode),sys/signal.rs(sigaction, signal handlers),sys/termios.rs(termios constants, terminal flags). - Docs: No separate docs dir – rustdoc is inline as
///doc comments on each item.reference/nix-crate/Cargo.tomldeclares the feature gates (braid currently enablesfs,user,term, andsignal); consult it before reaching for anixAPI to confirm which feature it lives under. - Use for: Touching any
nix::API, checking feature gates, understanding fd-ownership types, signal-safe helpers, or termios constants. Refresh after any change to thenixline incli/Cargo.tomlor anycargo update-driven bump inCargo.lock.
- Source:
btrfs docs
- Docs:
reference/btrfs-progs/Documentation/— RST docs from btrfs-progs. Start withindex.rstfor a full table of contents, or use the topic table below for common lookups. Glob by keyword for anything not in the table.ch-*fragments are inlined byjust fetch-references.
| Topic | File(s) |
|---|---|
| Adding/removing devices | Volume-management.rst, btrfs-device.rst |
| Device replacement | btrfs-replace.rst |
| Rebalancing | Balance.rst, btrfs-balance.rst |
| RAID profiles (RAID1 etc.) | mkfs.btrfs.rst (search for “profiles”) |
| Mount options | btrfs-man5.rst |
| Scrub / self-healing | Scrub.rst |
| Filesystem limits & storage model | btrfs-man5.rst |
| Administration overview | Administration.rst |
Citing reference/ code
braid’s own tracked files are cited by path#symbol or path#heading-slug
(doc and ADR file references); external upstream code is
different. It lives in reference/, which is gitignored and refreshed wholesale
by just fetch-references: it is absent on a clean checkout and invisible to CI.
A line number into it drifts on every refresh, and a braid-style path#symbol is
not greppable when the file is not on disk – neither form validates or even
resolves. Cite external upstream code by its shape:
- Short, behavior-defining snippet – one line or small function emitting a format,
token, or exit code braid parses. Inline the excerpt as frozen ground truth, so a reader
sees the contract without fetching
reference/. Stamp itpkg <version>, <path> (fn name)and drop the line number. Fence the excerpt with a non-rustlanguage tag –cfor source,textfor tool output – so rustdoc does not run it as a doctest. An unannotated orrust-tagged block becomes a failing doctest, caught bycargo test -p braid-cli --doc(notjust test-rust, whose--lib --bin --testselectors skip doctests). Precedent:cli/src/parse/cryptsetup_luks_version.rs#parse_cryptsetup_luks_version. An inline code span (`printf(...)`) is fine for a tight function or field doc where a fenced block is too heavy. Thepkg <version>stamp is the upstream release tag (git -C reference/<pkg> describe --tags); it pins the excerpt and is the re-verify trigger when a nixpkgs bump changes that tool’s version – the same parser-compatibility refresh event that recaptures fixtures. - Region or multi-line – a code area with no single quotable line (a long function, a
struct, two scattered lines). Keep a pointer, not a wall of inlined code:
pkg <version>, <path> (fn name)plus a one-line paraphrase of what’s there. Prefer a function name over a line number; a bare line range is a last resort.
Existing bare-line-number reference/ citations are tolerated – nothing validates them
either way – but migrate them toward the excerpt or pointer form when you next touch the
surrounding file.
TUI Snapshot Testing with Ratatui + Insta
Rendering for snapshots
Each TUI view module’s #[cfg(test)] block defines a small render that
draws the view into a TestBackend, then asserts via the shared snap!
helper. render is per-module (it calls that module’s own view function);
buffer_to_string and the snap! macro are shared from
cli/src/tui/test_support.rs.
#![allow(unused)]
fn main() {
use crate::tui::test_support::{buffer_to_string, snap};
// Per-module: calls this view's draw fn with a fixed `now` for determinism.
fn render(model: &Model, width: u16, height: u16) -> Terminal<TestBackend> {
let now = time::macros::datetime!(2026-02-24 02:12:00);
let mut terminal = Terminal::new(TestBackend::new(width, height)).unwrap();
terminal.draw(|frame| view(model, frame, now)).unwrap();
terminal
}
#[test]
fn snapshot_with_pool() {
let model = Model::new_demo(sample_disk_names(), PoolStatus::Mounted(sample_pool()));
snap!(buffer_to_string(&render(&model, 60, 24)));
}
}
snap! wraps insta::assert_snapshot! in
insta::with_settings!({ prepend_module_to_snapshot => false }, ...).
That setting defaults to true; we force it off so snapshot files are
named after the test alone (snapshot_with_pool.snap), not
braid_cli__tui__view__tests__snapshot_with_pool.snap. Always go through
snap! – a bare insta::assert_snapshot! would reintroduce the prefix
and write to a different filename.
insta could snapshot the TestBackend directly (it implements Display),
but buffer_to_string trims trailing whitespace per line for cleaner
diffs, so all view tests assert on its String. Styles/colors are not
captured – text only.
The cargo insta workflow
cargo test— runs tests normally. New/changed snapshots fail and produce.snap.newfiles alongside the existing.snapfiles.cargo insta review— interactive TUI that walks through each pending change with diffs. Keys:aaccept,rreject,sskip.cargo insta accept— bulk-accepts all pending.snap.newfiles without review.
Shortcut: cargo insta test --review runs tests then immediately opens the review TUI.
Typical cycle
# Write or change a test → run tests
just test-rust
# Tests fail because snapshot is new/different → .snap.new files appear
# Review the diffs interactively
cargo insta review
# Or if you trust the output, bulk accept
cargo insta accept
# Commit the .snap files
For first-time snapshots (no .snap file yet), cargo test will always fail — run cargo insta review or cargo insta accept to create the initial .snap.
What ratatui recommends
TestBackend+ insta for integration-level view tests (what we do)Buffer::empty()+ direct render for unit-testing individual widgets in isolation, asserting on buffer contents without a full terminal- Consistent terminal dimensions (e.g., 80x20) for reproducible snapshots
Planning and review hygiene
- Re-read the central files immediately before writing or reviewing a plan; do not rely on earlier conversation reads when code may have changed.
- For renames, refactors, and callsite sweeps, derive the inventory from
tracked files with
git ls-filesplusrg. Be explicit about exclusions and rerun the same search as verification. - Before planning recovery or cleanup recipes, verify every step against the
current
cmd_*/plan_*code and the relevant tool or kernel behavior. Treat issue recipes as hypotheses until the code proves them. - Architecture docs describe behavioral contracts, not internal helper names. Verify wrapper process/lifetime claims from the wrapper code before writing docs that depend on them.
- For external-tool exit-code or wording classifiers, trace the specific
subcommand return path in
reference/; a shared errno table is not enough to prove one invocation’s behavior.
Mutation safety heuristics
These elaborate principle 3, safe-by-construction operations for contributors.
- Query the authoritative source of state directly; do not pre-gate it with a cheaper but weaker observable such as path existence.
- Put invariant checks at the layer that owns the invariant. Primitive-level checks belong inside the helper that performs the unsafe operation; caller policy gates belong at callsites.
- Keep diagnostic refinements out of mutating-command state enums when the new
distinction only matters for
status,doctor, TUI, or error rendering. - Set fail-closed policy from the downstream failure mode. If a branch can corrupt state or strand a journal when a preflight is wrong, every uncertainty in that branch is a hard error even if a sibling branch can warn and proceed.
- Residual invariant checks must be hard errors in all builds; do not replace a
production guard with
debug_assert!. - Split post-commit failure variants by the operator’s remediation and on-disk consequence, not by implementation layer.
Doc and ADR file references
In ADRs, decision docs, and docs/ prose, never reference another file by line
number. Line numbers drift the moment surrounding code or text is edited, so the
pointer silently goes stale and misleads the next reader. Use a path#anchor
reference instead – one shape for both code and docs, where the anchor names
what and the path says where:
- Code –
path#symbolas a plain code span, not a link:(see `cli/src/cmd/unlock.rs#cmd_unlock`). The symbol is afn,struct,enum,trait,impl, module, orconst, method-qualified where it helps (cli/src/cmd/plan.rs#Planner::plan). The symbol is the drift-proof, greppable half – onerg cmd_unlockfinds both the citation and the definition. Never writecli/src/cmd/unlock.rs:142, and do not linkify code paths:cli/lives outside the mdBook root, so a link 404s in the rendered book and dodges linkcheck. A bare file path (no#symbol) is fine when the whole file is the referent. - Markdown / mdBook –
path#heading-slugas a real Markdown link, e.g.[...](docs/internals/luks-unlock.md#header-backup-workflow-and-messaging), not a line number or section count. Unlike code refs these are clickable and validated bymdbook-linkcheck2, so a renamed heading fails CI instead of rotting silently.
A symbol or heading anchor survives edits and is greppable; a line number is
neither. This applies to docs and comments – transient analysis in plans/wip/
is exempt.
This rule governs braid’s own tracked files; external upstream code under
reference/ is gitignored and cited differently (by shape, not path#symbol) –
see Citing reference/ code.
Decision-doc references
A decision doc with status: Superseded or Deprecated is a point-in-time
record. Do not rewrite its body or ## See section to track current code – the
> Superseded by ... banner is the forward pointer to live artifacts. Repointing
a frozen doc’s references at today’s successor code only makes it contradict its
own narrative.
Independent of status, a ## See bullet whose path no longer resolves is a broken
pointer, not history. Drop it; or, if the removed file has lasting reference value
(an archived design doc or plan – not deleted dead code), replace the bare path
with the git-history-note form used in 002 and 003:
(preserved in git history; last present at commit <hash>). The ## See path
half of this rule is enforced by scripts/docs/check-see-paths.py.
Rust doc comments
When adding a new top-level function, type, module, trait, or
pub/pub(crate) item in the Rust CLI, add a /// doc comment justifying
why it exists at that boundary. Capture intent, invariant, ownership, or
call-site coupling – not the signature.
Prefer one to three lines. If removing the comment would not lose any information a reader could not recover from the code, do not write it.
Skip:
- Trait impls whose purpose is the trait (
Display,Debug,From,Default, …) - Enum variants already covered by an enum-level doc
#[cfg(test)]items and test fixtures
Good:
- “Shared mapper ownership classifier so planner and executor use the same LUKS UUID invariant.”
- “Separate from
MountStatebecause we observe LUKS state without holding the pool lock.”
Bad:
- “Returns mapper ownership.” (restates signature)
- “Helper used by the planner.” (vague)
- “Caller must ensure path is canonical.” (fabricated invariant nothing enforces)
Rust CLI only. Nix module options use NixOS option description fields;
shell scripts and Python tests follow their own conventions (see
testing.md).