CLI Reference
Every stax subcommand and flag. Defaults are stated explicitly. For
task-oriented walkthroughs, see the Guide.
stax <COMMAND> [OPTIONS]Every subcommand except record and setup connects to stax-server over
its local socket and fails loudly if the daemon is not running.
Global options
stax is built on figue, so the
standard builtins are available before any subcommand:
| flag | effect |
|---|---|
-h, --help | show help and exit 0 |
--html-help | open HTML help in the browser and exit |
-V, --version | show version and exit 0 |
--completions <bash,zsh,fish> | print a shell completion script |
stax record
Start a recording. Forwards every event to stax-server for the web UI and
the query subcommands. See Recording a Run.
stax record [OPTIONS] [-- COMMAND…]| flag / arg | type | default | meaning |
|---|---|---|---|
-F, --frequency <HZ> | u32 | 900 | sampling frequency, in hertz |
-l, --time-limit <SECS> | u64 | (none — unlimited) | stop after this many seconds |
-p, --pid <PID> | u32 | (none) | attach to an existing process instead of launching one |
--no-dwarf-unwind | bool | false | Linux x86-64: disable .eh_frame DWARF unwinding of user stacks |
--daemon-socket <PATH> | String | /var/run/staxd.sock | local socket of the privileged staxd daemon |
[-- COMMAND…] | positional | (none) | command to launch and profile; use -- to protect its flags |
You must supply either --pid or a launch command — not both, not
neither. stax record --pid 1 -- ./foo and a bare stax record are both
errors.
On x86-64 Linux, .eh_frame DWARF unwinding is on by default — the
system libc is built -fomit-frame-pointer, so the kernel's stack walk
truncates for any sample landing in it. --no-dwarf-unwind turns it off (so
does STAX_DWARF_UNWIND=0); it is a no-op on macOS and aarch64. See
Stack Unwinding.
--daemon-socket defaults to /var/run/staxd.sock; on Linux that resolves
to /run/staxd.sock, where sudo stax setup installs the systemd-managed
staxd. On Linux, if no staxd socket exists, stax records in-process.
stax setup
Codesign this stax binary or, when run as root, install staxd as a
LaunchDaemon. sudo stax setup is the privileged install step from
Getting Started.
stax setup [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
-y, --yes | bool | false | skip the confirmation prompt before codesign |
stax status
Print the current state of stax-server: the active run, if any, plus when
the daemon started. Takes no options. See
Run Lifecycle.
stax list
List every run stax-server has hosted — active and history, oldest first.
Takes no options. History is server-memory history and does not survive a
daemon restart unless you save the current queryable run with
stax save. Use stax select-run to
restore a stopped history row into the current query state. See
Run Lifecycle.
stax wait
Block until a condition fires, the active run stops, or the timeout elapses. See Run Lifecycle.
stax wait [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--for-samples <N> | u64 | (none) | return after at least N PET samples have landed |
--for-seconds <N> | u64 | (none) | return after N seconds of wall-clock time |
--until-symbol <S> | String | (none) | return once a symbol containing substring S is seen (case-sensitive) |
--timeout-ms <MS> | u64 | (none) | hard deadline for the whole wait |
--for-samples, --for-seconds, and --until-symbol are mutually
exclusive — pass at most one. --timeout-ms is independent. With no flags,
wait blocks until the active run stops. Exit codes:
Exit Codes.
stax stop
Ask stax-server to stop the active run cleanly and print the final summary.
Takes no options. Exits non-zero if there is no active run. See
Run Lifecycle.
stax save
Save the current or most recent queryable run to an archive. Paths ending in
.stax create a single-file facet-json package. Other paths create the v2
directory layout: manifest.json plus typed facet-json chunks
(aggregator.json, binaries.json, and target-ingest.json) plus an
append-friendly events.jsonl sidecar; copied text bytes live under blobs/.
The manifest/package records archive version, save time, producer/version,
OS/arch, and run summaries. The chunks or package store raw aggregator
streams, binary/symbol metadata, target-ingest diagnostics, typed
SavedEventLogEntry records, and any code-byte blobs needed by annotate.
New readers replay those records when present and keep the aggregate
chunks/package members as a compatibility and inspection path.
stax save <PATH>| arg | type | meaning |
|---|---|---|
<PATH> | positional String | directory archive to create, or .stax package file to write |
stax save works while a run is active, and after stax stop, until the
next recording resets the live aggregator.
Archive compatibility is strict in the current format: open and compare
accept v2 directory archives, .stax packages, and legacy v1 archive.json
archives, and reject other versions loudly.
stax open
Open a saved run archive into stax-server's current query state.
stax open <PATH>| arg | type | meaning |
|---|---|---|
<PATH> | positional String | archive directory, .stax package, v2 manifest.json, or legacy v1 archive.json |
After stax open, threads, top, flame, and diagnose operate on the
restored run. V2 archives replay events.jsonl or embedded package events
when present; legacy and minimal archives fall back to aggregate chunks.
open refuses to replace state while a recording is active.
stax select-run
Restore one stopped in-memory run from stax list into stax-server's
current query state.
stax select-run <RUN_ID>| arg | type | meaning |
|---|---|---|
<RUN_ID> | positional u64 | run id from stax list |
After select-run, threads, top, flame, annotate, and diagnose
operate on that run. It refuses to replace state while a recording is active.
This is server-memory history, not persistence; save restart-safe artifacts
with stax save and restore them with stax open.
The reporting commands also accept --run <RUN_ID> for non-mutating one-off
queries of stopped in-memory runs.
stax compare
Compare two saved run archives without touching stax-server state.
stax compare [OPTIONS] <BASELINE> <CANDIDATE>| flag / arg | type | default | meaning |
|---|---|---|---|
--json | bool | false | print a machine-readable facet-json report |
--fail-active-delta-ms <MS> | f64 | (none) | fail if candidate active time increases past this |
--fail-target-delta-ms <MS> | f64 | (none) | fail if candidate target time increases past this |
--fail-off-cpu-delta-ms <MS> | f64 | (none) | fail if candidate off-CPU time increases past this |
--fail-target-delta-pct <PCT> | f64 | (none) | fail if candidate target time increases past this percent |
--fail-unlinked-origins-delta <COUNT> | u64 | (none) | fail if unlinked-origin count increases past this |
--fail-missing-origins-delta <COUNT> | u64 | (none) | fail if missing-origin count increases past this |
--fail-bad-duration-drops-delta <COUNT> | u64 | (none) | fail if bad-duration drops increase past this |
--fail-target-queue-drops-delta <COUNT> | u64 | (none) | fail if target-side queue drops increase past this |
--fail-worker-disconnect-drops-delta <COUNT> | u64 | (none) | fail if worker-disconnect drops increase past this |
<BASELINE> | positional String | (required) | baseline archive directory, .stax package, v2 manifest, or legacy v1 archive.json |
<CANDIDATE> | positional String | (required) | candidate archive directory, .stax package, v2 manifest, or legacy v1 archive.json |
The comparison reads each archive directly and prints deltas for PET samples,
on/off-CPU interval time, target time, target span counts, origin-link counts,
ingest drops, and the top target lanes by duration. V2 inputs use the same
event-replay preference as stax open. --json emits the same comparison as
named baseline/candidate/delta fields for CI and benchmark notes. Threshold
flags fail the command when a positive candidate delta exceeds the limit;
those failures are also reported as threshold_failures in JSON.
stax top
Snapshot the top-N functions or target-span names of the current run. See Inspecting a Run.
stax top [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
-n, --limit <N> | u32 | 20 | maximum number of entries to return |
--sort <MODE> | String | self | self (leaf-only) or total (any frame) |
--tid <TID> | u32 | (none) | restrict to one thread; default is all threads |
Output columns are active time, target-executor time, PET samples, target
span count, and function/span name. For a synthetic target lane,
--tid <TID> shows per-span durations in target ms and span counts in
spans. For a target-only ranking that aggregates across origins, use
stax target top. When target spans carry origins, filtering
to the origin CPU tid also includes the matching target lane work as
provenance-linked parallel work; it does not turn GPU/accelerator time into CPU
execution under the dispatch stack. If Metal command/dispatch frames are
visible but no target lane is present, stax top prints a stderr hint about
explicit stax-target / Lane::metal Metal 4 timestamp cooperation. If the
view is empty but the run has off-CPU/thread activity or target lanes outside a
--tid filter, top prints a discovery hint for stax threads -n 0,
target-lane tids, or stax-target integration.
stax flame
Print the active flamegraph as an indented tree. See Inspecting a Run.
stax flame [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
-d, --max-depth <N> | usize | 12 | stop printing below depth N; cut subtrees collapse to a summary |
--threshold-pct <PCT> | f64 | 1.0 | hide subtrees below this percent of total active time; 0 for all |
--tid <TID> | u32 | (none) | restrict to one thread; default is all threads |
Cooperating target lanes render as (all) -> lane -> span name, with per-node
active time, target time, span count, and percent columns. When target spans
carry origins, --tid <cpu tid> keeps the lane tree and filters it to work
linked to that CPU origin. Like top, flame prints a Metal
cooperation hint when Metal command/dispatch frames are visible but no
synthetic target lane has reported spans. Empty flame views also get the same
threads / target-lane / stax-target discovery hints as top.
stax threads
Per-thread and synthetic-lane CPU/target/off-CPU breakdown for the current
run, sorted by total activity. CPU thread rows include origin-linked target
span duration queued from that thread as provenance-linked parallel work, and
synthetic target lanes with spans are included even if they fall past the
normal -n cutoff. The output includes a
kind column: thread for real sampled threads and target for synthetic
target lanes. See
Inspecting a Run.
stax threads [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
-n, --limit <N> | u32 | 20 | maximum threads to print; 0 prints every thread |
stax target
Inspect cooperating target lanes and target span/shader names directly. These commands are the CLI discovery points for questions like "which GPU lane exists?", "which shader/span took the most time?", and "which shader/span ran most often?" They use the same target-span aggregate as the web target details panel, and they keep target work parallel instead of pretending it is CPU stack execution.
stax target lanes
stax target lanes [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
-n, --limit <N> | u32 | 20 | maximum target lanes to print; 0 prints all |
Output columns are exact target time, span count, lane kind, synthetic tid, and
lane name. It is equivalent to the target-only subset of stax threads, sorted
by target time.
stax target top
stax target top [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
-n, --limit <N> | u32 | 20 | maximum span/shader rows to print; 0 prints all |
--by <MODE> | String | time | time, count, avg, or max |
--tid <TID> | u32 | (none) | filter to a target lane tid or origin-linked CPU tid |
Rows aggregate by lane + span/shader name across origin groups, so one kernel does not split into many rows merely because it was dispatched from several CPU stacks. Columns are total target time, invocation count, average duration, max duration, lane kind, lane name, and span/shader name.
stax annotate
Disassemble and annotate one function from the current run. See Inspecting a Run.
stax annotate <TARGET> [OPTIONS]| flag / arg | type | default | meaning |
|---|---|---|---|
<TARGET> | positional String | (required) | hex address (0x10004ad60) or a substring of a demangled name |
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
--tid <TID> | u32 | (none) | restrict to one thread; default is all threads |
A name substring is matched case-insensitively against the run's top-256 leaf-self functions; the hottest match wins.
stax diagnose
Dump stax-server diagnostics, including target-span ingest counters
(batches, recorded/dropped spans, lane totals, origin link/unlink counts,
unlinked-origin reasons, PET origin-distance min/avg/max, typed target metadata
record counts, and stax-target local queue drops). It also prints target-ingest
hints for missing batches, invalid span durations, missing origins, origins that
failed to link, metadata records that arrive without executable spans, missing
source/shader pairing, counter definitions without samples, batches that arrived
with no active run, batches from the wrong pid, and target-side queue-full /
worker-disconnected drops. See
Troubleshooting.
stax diagnose [OPTIONS]| flag | type | default | meaning |
|---|---|---|---|
--run <RUN_ID> | u64 | (none) | query a run without changing selected query state |
stax dump
Ask every running stax process (staxd, stax-server, stax) to write a
SIGUSR1 telemetry/debug snapshot into unified logging. Takes no options. See
Troubleshooting.