Active – Refines 017-runtime-disk-membership.md.
Decision: LUKS UUID Is Disk Identity
Principle: Stable identifiers
Context
Runtime membership originally used the operator disk name as the key in
pool.json. The same name also appears in mapper names and LUKS labels, so
code could accidentally treat display/runtime handles as identity. That made
label drift, mapper drift, and cloned disks hard to reason about: a member
could be the same encrypted device while its label or mapper path changed, or
two different by-id paths could expose the same cloned LUKS header.
Decision
Use the LUKS UUID as the persistent disk identity. pool.json and
pending-op.json membership snapshots are keyed by canonical LUKS UUIDs.
DiskMember.name remains the operator-facing name and DiskMember.by_id
remains the hardware address used to reach the device. DiskMember.devid is
persisted only as prior-binding state for btrfs cases where the live device is
observable by devid but not by LUKS UUID, such as null_underlying mappers and
missing_devids.
Fresh add and replace operations pre-generate the UUID that cryptsetup must
write, store that UUID in the journal before mutation, and pass it through the
structured CryptsetupLuksFormat request. User-supplied --luks-format-arg
values may not override --uuid or --label.
Identity Boundaries
| Identifier | Role | Persistent identity? | Normal user vocabulary? |
|---|---|---|---|
LuksUuid | Encrypted-volume identity used for membership correlation, journals, duplicate detection, and live probe checks. | Yes | No |
DiskName | Operator-facing name used in commands, status summaries, mapper suffixes, and labels. | No | Yes |
ByIdPath | Hardware address used to find, open, or format a disk before it is mapped. | No | Setup and repair only |
DiskMember.devid | Prior btrfs binding used when btrfs can report a device by devid but no live LUKS UUID is observable. | Fallback binding only | Repair diagnostics only |
braid-<DiskName> mapper name | Runtime handle passed to cryptsetup, btrfs, mount, and close operations. | No | Mostly hidden |
braid-<DiskName> LUKS label | Human/debug label for LUKS headers and discovery bootstrapping. | No | Mostly hidden |
This means UUID identity does not move normal command vocabulary from names to
UUIDs. Operators still add, replace, remove, and read disks by names such as
toshiba1. UUIDs belong in pool.json, journals, machine-readable status, and
diagnostics where braid must prove that the encrypted member is the expected
one.
Benefits
- Single source of truth.
pool.jsonhas one persistent member identity: the LUKS UUID map key. Disk name, by-id path, and btrfs devid no longer duplicate or compete with a value-sideluks_uuidfield. - Drift-tolerant member correlation. Commands resolve membership by UUID
instead of reconstructing identity from
braid-<name>. A member opened under a drifted mapper can still be recognized as the same disk, and cleanup paths close the observed mapper rather than the expected one. - Safer recovery replay. Journals carry UUID-keyed pre-operation and target membership snapshots. Recovery can compare the live pool against the journaled member set by UUID/devid and re-check live UUIDs before replaying format, add, replace, resize, or close steps.
- Earlier clone and swap detection. Duplicate LUKS UUIDs are rejected before
membership writes or destructive operations, and UUID mismatches catch disks
that were swapped, cloned, or reformatted after the original plan was made.
addandreplacealso re-probe the mounted pool at execution time before writing the journal, so confirmation/passphrase-window races still hit the UUID guard. - Human-facing names stay human-facing. Operators still type and read disk
names such as
toshiba1; mapper names and labels remainbraid-<DiskName>. UUIDs appear where they help diagnostics or machine-readable state, not as the normal command vocabulary. - Present-device probes use live paths. Queries such as lsblk model/serial
and smartctl use the live backing path (
PoolState::underlying_for_uuid), and the TUI disk-detail LUKS metadata dump (cryptsetup luksDump) reads the live backing path for a verified-present (Unlocked) member – not persisted by-id setup/repair handles that can drift while the disk is still present. Metadata for locked or ownership-unverified mappers stays on the by-id handle.
Concrete Improvements
- Membership shape is simpler. Membership has one identity axis: UUID keys map to name/by-id/devid metadata.
- Formatting is crash-replayable. Fresh
addandreplacepaths generate the UUID before mutation, journal it, and pass it to cryptsetup. Recovery can tell whether it is seeing the exact LUKS container that the interrupted plan intended to create. - Cleanup follows observed ownership.
lockclassifies live mappers by UUID/devid and closes the mapper it actually observed. A mapper opened asbraid-WRONGbut owned bydisk1is closed asbraid-WRONG; braid does not merely trybraid-disk1and leave the real mapper open. - Recovery compares member sets by identity. Pending operations carry UUID-keyed pre-operation and target membership snapshots, so recovery can compare live topology with the journaled member set instead of re-discovering by label or assuming names still line up.
- Display code has an explicit join rule. User-facing summaries resolve a
live pool device’s UUID back to
DiskNamefor presentation. UUIDs remain available to verbose/machine-readable paths where they are useful evidence. The TUI Data-tab Bus column is the last display correlation to adopt this rule: its lsblk transport bridge now joins the parent disk’s LUKS UUID to the member name, so transport survives mapper drift like every sibling cell instead of blanking to--.
Runtime Handles And Labels
- Mapper names remain
braid-<DiskName>. - LUKS labels remain
braid-<DiskName>. - Both mapper names and labels are presentation/runtime handles, not identity.
LuksUuidis the only persistent identity for membership decisions.- Code may construct
mapper_name(&member.name)when opening or addressing braid’s expected mapper. - Code must not parse mapper names or LUKS labels to decide membership, target
a member, or correlate live pool state. Narrow exceptions are allowed for
bootstrapping and sanity checks only:
discoverbootstraps from cold braid-labeled disks; returning-disk adoption inaddmay gate on label match after identity correlation still usesLuksUuid/devid/FSID; fresh add and replace recovery may require the expected label before treating an already-formatted target as the crash-created LUKS container, but still requires the journaled UUID to match.lockmay use thebraid-*prefix only to discover cleanup candidates; member identity still requires UUID/devid evidence, and candidates whose backing LUKS UUID cannot be verified are warned and skipped. lockis the special cleanup case: classify live mappers by UUID/devid first, then close the observed mapper name, not a reconstructedmapper_name(&member.name), so drifted-but-member-owned mappers are closed correctly. If mounted per-device probing fails,lockreads the mounted filesystem FSID to key the exclusive-operation preflight (so it will not unmount mid balance/replace), then scans/dev/mapper/braid-*candidates and closes only those with verified backing LUKS UUIDs. The unmount is licensed by mount-point ownership, not an FSID identity match (see Limits And Non-Goals). If anull_underlyingmapper’s persisted devid resolves to multiple membership UUIDs,lockwarns, leaves that mapper open, and marks cleanup uncertain instead of demoting it to orphan cleanup.lockreportsdisk <name>: already closedonly for members the planner has proved absent from every observed live state; it must not reconstructmapper_name(&member.name)during execute to infer absence. If a mapper is skipped because classification failed, or/dev/mappercannot be enumerated in either close-set arm, cleanup is uncertain and lock suppresses all already-closed claims for unobserved members.- Commands that reuse an already-open expected mapper for a requested by-id path must verify the mapper’s canonical backing path before trusting the mapper’s LUKS UUID. A cloned LUKS header can give two physical devices the same UUID, so the runtime proof is backing path match first, then UUID match.
- Recovery must fail closed when a live btrfs device lacks an observable LUKS
UUID and the journal has no persisted devid binding. It must not recover by
inferring identity from
braid-<DiskName>. replacemust re-probe the mounted pool after confirmation and passphrase verification but before sleep inhibitor acquisition, journal write, orbtrfs replace start. If the pool is no longer mounted, the FSID differs from the planned pool, or any live pool device has the replacement target’s LUKS UUID, replace fails closed with the canonical pre-journal validation orDuplicateUuid { scope: LivePool }refusal.
Offline Disk State
A recorded member whose by-id path is present, whose LUKS header is readable,
and whose on-disk LUKS UUID matches the pool.json membership key is identity
verified. If that member is not assembled into the live btrfs pool, status and
TUI surfaces render it as offline, distinct from missing (device absent) and
unknown (braid cannot classify the state).
offline is deliberately cause-neutral. It can describe a locked member in a
degraded mount, an interrupted post-commit mutation, or another state where
membership and live btrfs topology have not yet been reconciled. Because those
causes have different remedies, braid status does not print an Action: hint
for offline rows.
braid doctor’s declared_disks check also surfaces an offline member as a
cause-neutral Warn, never Fail; Fail stays reserved for a live LUKS UUID
mismatch. When the pool is mounted but live topology cannot be probed, doctor
warns rather than claiming every declared member is assembled.
Limits And Non-Goals
- A LUKS UUID identifies an encrypted LUKS container, not a physical drive, enclosure slot, SATA port, or by-id path.
- A cloned LUKS header intentionally has the same UUID as its source. Braid treats that as a duplicate identity and rejects it; it does not invent a new member identity for the clone.
- Mapper and label drift are tolerated for correlation and cleanup, but braid
does not silently rewrite drifted mapper names or labels back into the
expected
braid-<DiskName>form. devidremains btrfs state. It is allowed only as a prior binding for missing/null-underlying cases where btrfs can still identify a member but braid cannot currently observe the LUKS UUID.- A member with neither observable LUKS UUID nor journaled/persisted devid is not recoverable by mapper-name inference. The right behavior is to preserve recovery state and require manual reconciliation.
- UUIDs are not a user-facing naming scheme. They may appear in diagnostics,
pool.json,pending-op.json, and machine-readable output, but command selection and normal summaries should continue to useDiskName. lock’s mounted-fallback teardown unmounts the configured btrfs mount point (licensed by mount-point ownership, not an FSID identity match – braid persists no durable pool FSID to compare a probe against), then scans only/dev/mapper/braid-*and closes by backing LUKS UUID: verified member UUIDs close as members, verified non-memberbraid-*mappers close as orphans; non-braid-*devices and unverified candidates are skipped. The cleanup is scoped by thebraid-*namespace plus UUID, not by which devices backed the unmounted filesystem. Consequence: a foreign btrfs at braid’s mount point would be unmounted (a non-destructive, EBUSY-safeumountwith no-f/-l); a foreign filesystem normally sits on non-braid-*devices, so the realistic consequence is the unmount alone. This is accepted, and gating it would require a durable pool-FSID identity axis this decision deliberately omits to keep membership single-axis.
Tests That Enforce This
cli/src/membership.rsunit tests pin UUID-keyedpool.json, reject stale value-sideluks_uuid, and enforce duplicate checks across UUID, name, by-id, and devid axes.cli/src/types.rsandcli/src/cmd.rsunit tests reject user-supplied--uuid/--labelextras and pin the structuredcryptsetup luksFormat --uuid <uuid> --label <label>argv order.cli/src/status.rsunit tests pin compact status names by resolving live pool UUIDs back toDiskName, including a drifted mapper case.cli/src/status.rsandcli/src/tui/probe.rsunit tests pin that a present, LUKS-identity-verified member absent from the live pool rendersoffline, notmissingorunknown.cli/src/doctor.rsunit tests pin thatdeclared_disksrenders verified members absent from the live pool as cause-neutralWarn, keeps UUID mismatches asFail, preserves offline-pool identity-only behavior, and warns when mounted-pool topology cannot be probed.tests/cli/braid-status-rust.pypins that present disks’ renderedluks_uuidequals the real cryptsetup UUID and thepool.jsonmembership key, and thatnameis the operator name, in intact and degraded states.tests/cli/braid-status-rust.pypins that a degraded mount with one closed verified member renders that member asOFFLINEin human output andofflinein JSON while the pool summary remains degraded.tests/cli/braid-doctor-offline-member.pypins that a degraded mounted pool with one closed verified member makesdeclared_diskswarn with offline wording, while a fully assembled pool and an offline pool remain Ok.tests/cli/status-mapper-drift.pypins thatbraid statusresolves the operator name via the UUID join when a member is open under a drifted mapper (braid-WRONG), not the mapper basename, in both JSON and human output.cli/src/tui/probe.rsunit tests pin the TUI Data-tab Bus column’s transport join to the parent disk’s LUKS UUID, so a member open under a drifted mapper (braid-WRONG) still renders its bus instead of degrading to--.cli/src/tui/probe.rsunit tests pin that the disk-detail LUKS metadata dump reads the live backing path for a verified-present member (surviving by-id drift), and that a foreign / ownership-unverified mapper does not surface the live device’s metadata under the declared disk.cli/src/tui/probe.rsandcli/src/tui/browse/state.rsunit tests pin that the TUI Browse SMART picker resolves a verified-present member through its live backing path (PoolState.disk_underlying, shared with the Data-tab SMART loop) and an offline member through its persisted by-id handle, so the two SMART surfaces cannot disagree under by-id drift.tests/cli/braid-tui-browse.pypins the live/dev/vd*node end-to-end for a present, unlocked member.cli/src/lock.rsunit tests pin the normal UUID/devid-classified close set, observed-mapper closing, UUID-scanned fallback cleanup, orphan warnings for non-member UUID/devid cases, duplicate-devidnull_underlyingskip behavior, and skip warnings for unverified candidates.cli/src/remove.rsunit tests pin all live member devids into the pre-operation journal snapshot before mutation, so recovery has a legitimate fallback binding when LUKS UUID is not observable.cli/src/recover.rsunit tests verify recovery refuses a null-underlying member when the journal lacks both observable UUID and persisted devid, instead of falling back to mapper-name inference.cli/src/enroll_key_file.rsunit tests verify standalone enroll rejects a member whose live LUKS UUID does not match the pool.json membership key before any slot inventory or keyfile mutation runs. Enroll also re-probes each member’s live UUID again at its mutation boundary, after the passphrase prompt and beforeluksAddKey, to catch a disk swapped or reformatted during the prompt window: unit tests pin the standalone re-probe’s mismatch and fail-closed arms and the discovery->execute window closure (a swap that passes discovery is rejected at execute before any keyfile is enrolled or generated).cli/src/replace.rsunit tests verifyReplacePlan::executere-probes the live pool before journal write, rejects unmounted/FSID-drifted/colliding live-pool state, and still proceeds when the fresh probe is clean.cli/src/recover.rsunit tests verify post-maintenance replace recovery re-probes the old mapper UUID before close, skips foreign mappers, and still closes owned active dm mappings without relying on/dev/mapperpath nodes.cli/src/luks.rsandcli/src/probe.rsunit tests verify already-open expected mappers must have the requested backing path before UUID ownership is accepted.tests/cli/luks-mapper-drift.pyverifiesbraid lockcloses the observed drifted mapper owned by a member UUID.tests/cli/luks-lock-skipped-no-false-closed.pyverifies skipped mapper uncertainty does not produce falsealready closedrows.tests/cli/unlock-uuid-mismatch.py,tests/cli/enroll-uuid-mismatch.py, andtests/cli/recover-replace-existing-luks-uuid-mismatch.pyverify swapped or reformatted disks fail UUID re-checks before unsafe replay, slot enrollment, or mount.tests/cli/replace-new-in-pool-guard.pyverifies duplicate LUKS UUIDs are rejected before braid writes membership or calls into btrfs mutation.tests/cli/replace-live-pool-collision-race-rejected.pyverifies replace’s execute-time live-pool re-probe rejects a cloned replacement UUID added to the mounted pool while replace waits for confirmation.tests/cli/braid-add-cloned-luks-header-rejected.pyandtests/cli/replace-cloned-luks-header-rejected.pyverify cloned LUKS headers cannot make add or replace reuse a mapper opened from the wrong physical device.tests/cli/braid-add-persists-before-balance.pyverifies fresh add writes canonical UUID-keyed membership, without a duplicate value-sideluks_uuid, before post-add maintenance continues.tests/cli/braid-doctor-uuid-swap.pyverifiesbraid doctorfails closed when a member’s live LUKS UUID diverges from its pool.json key, surfacing the swap before any mutating command runs.
Consequences
pool.jsonkey order is UUID order, not disk-name order. Display surfaces that need stable operator ordering must sort byDiskName.- Recovery trusts journaled UUID-keyed membership snapshots for phase-specific replay and verifies live UUIDs again at mutation boundaries where a physical disk could have been swapped or reformatted.
- Mapper and label drift no longer break membership correlation, but drifted handles are not silently reconciled back into membership.
- Cloned disks with duplicate LUKS UUIDs are rejected before membership is written.
Rejected Alternatives
- Keep disk name as identity. Disk names are useful for humans but are not intrinsic to the encrypted device. Keeping them as identity preserves the label/mapper drift hazard.
- Use by-id as identity. by-id paths identify hardware slots/devices, not encrypted membership. They can change with enclosures or controller behavior, and they do not detect cloned LUKS headers.
- Use btrfs devid as identity. Devids are live filesystem state and are unavailable before mount. They remain useful only as fallback binding for missing or null-underlying devices.
See
- 017-runtime-disk-membership.md
- ../principles.md
cli/src/membership.rscli/src/journal.rscli/src/recover.rscli/src/lock.rs