CLI Reference

Every stax subcommand and flag. Defaults are stated explicitly. For task-oriented walkthroughs, see the Guide.

text
stax <COMMAND> [OPTIONS]

Every subcommand except record and setup connects to stax-server over its local socket and fails loudly if the daemon is not running.

Global options

stax is built on figue, so the standard builtins are available before any subcommand:

flageffect
-h, --helpshow help and exit 0
--html-helpopen HTML help in the browser and exit
-V, --versionshow version and exit 0
--completions <bash,zsh,fish>print a shell completion script

stax record

Start a recording. Forwards every event to stax-server for the web UI and the query subcommands. See Recording a Run.

text
stax record [OPTIONS] [-- COMMAND…]
flag / argtypedefaultmeaning
-F, --frequency <HZ>u32900sampling frequency, in hertz
-l, --time-limit <SECS>u64(none — unlimited)stop after this many seconds
-p, --pid <PID>u32(none)attach to an existing process instead of launching one
--no-dwarf-unwindboolfalseLinux x86-64: disable .eh_frame DWARF unwinding of user stacks
--daemon-socket <PATH>String/var/run/staxd.socklocal socket of the privileged staxd daemon
[-- COMMAND…]positional(none)command to launch and profile; use -- to protect its flags

You must supply either --pid or a launch command — not both, not neither. stax record --pid 1 -- ./foo and a bare stax record are both errors.

On x86-64 Linux, .eh_frame DWARF unwinding is on by default — the system libc is built -fomit-frame-pointer, so the kernel's stack walk truncates for any sample landing in it. --no-dwarf-unwind turns it off (so does STAX_DWARF_UNWIND=0); it is a no-op on macOS and aarch64. See Stack Unwinding.

--daemon-socket defaults to /var/run/staxd.sock; on Linux that resolves to /run/staxd.sock, where sudo stax setup installs the systemd-managed staxd. On Linux, if no staxd socket exists, stax records in-process.

stax setup

Codesign this stax binary or, when run as root, install staxd as a LaunchDaemon. sudo stax setup is the privileged install step from Getting Started.

text
stax setup [OPTIONS]
flagtypedefaultmeaning
-y, --yesboolfalseskip the confirmation prompt before codesign

stax status

Print the current state of stax-server: the active run, if any, plus when the daemon started. Takes no options. See Run Lifecycle.

stax list

List every run stax-server has hosted — active and history, oldest first. Takes no options. History is server-memory history and does not survive a daemon restart unless you save the current queryable run with stax save. Use stax select-run to restore a stopped history row into the current query state. See Run Lifecycle.

stax wait

Block until a condition fires, the active run stops, or the timeout elapses. See Run Lifecycle.

text
stax wait [OPTIONS]
flagtypedefaultmeaning
--for-samples <N>u64(none)return after at least N PET samples have landed
--for-seconds <N>u64(none)return after N seconds of wall-clock time
--until-symbol <S>String(none)return once a symbol containing substring S is seen (case-sensitive)
--timeout-ms <MS>u64(none)hard deadline for the whole wait

--for-samples, --for-seconds, and --until-symbol are mutually exclusive — pass at most one. --timeout-ms is independent. With no flags, wait blocks until the active run stops. Exit codes: Exit Codes.

stax stop

Ask stax-server to stop the active run cleanly and print the final summary. Takes no options. Exits non-zero if there is no active run. See Run Lifecycle.

stax save

Save the current or most recent queryable run to an archive. Paths ending in .stax create a single-file facet-json package. Other paths create the v2 directory layout: manifest.json plus typed facet-json chunks (aggregator.json, binaries.json, and target-ingest.json) plus an append-friendly events.jsonl sidecar; copied text bytes live under blobs/. The manifest/package records archive version, save time, producer/version, OS/arch, and run summaries. The chunks or package store raw aggregator streams, binary/symbol metadata, target-ingest diagnostics, typed SavedEventLogEntry records, and any code-byte blobs needed by annotate. New readers replay those records when present and keep the aggregate chunks/package members as a compatibility and inspection path.

text
stax save <PATH>
argtypemeaning
<PATH>positional Stringdirectory archive to create, or .stax package file to write

stax save works while a run is active, and after stax stop, until the next recording resets the live aggregator.

Archive compatibility is strict in the current format: open and compare accept v2 directory archives, .stax packages, and legacy v1 archive.json archives, and reject other versions loudly.

stax open

Open a saved run archive into stax-server's current query state.

text
stax open <PATH>
argtypemeaning
<PATH>positional Stringarchive directory, .stax package, v2 manifest.json, or legacy v1 archive.json

After stax open, threads, top, flame, and diagnose operate on the restored run. V2 archives replay events.jsonl or embedded package events when present; legacy and minimal archives fall back to aggregate chunks. open refuses to replace state while a recording is active.

stax select-run

Restore one stopped in-memory run from stax list into stax-server's current query state.

text
stax select-run <RUN_ID>
argtypemeaning
<RUN_ID>positional u64run id from stax list

After select-run, threads, top, flame, annotate, and diagnose operate on that run. It refuses to replace state while a recording is active. This is server-memory history, not persistence; save restart-safe artifacts with stax save and restore them with stax open. The reporting commands also accept --run <RUN_ID> for non-mutating one-off queries of stopped in-memory runs.

stax compare

Compare two saved run archives without touching stax-server state.

text
stax compare [OPTIONS] <BASELINE> <CANDIDATE>
flag / argtypedefaultmeaning
--jsonboolfalseprint a machine-readable facet-json report
--fail-active-delta-ms <MS>f64(none)fail if candidate active time increases past this
--fail-target-delta-ms <MS>f64(none)fail if candidate target time increases past this
--fail-off-cpu-delta-ms <MS>f64(none)fail if candidate off-CPU time increases past this
--fail-target-delta-pct <PCT>f64(none)fail if candidate target time increases past this percent
--fail-unlinked-origins-delta <COUNT>u64(none)fail if unlinked-origin count increases past this
--fail-missing-origins-delta <COUNT>u64(none)fail if missing-origin count increases past this
--fail-bad-duration-drops-delta <COUNT>u64(none)fail if bad-duration drops increase past this
--fail-target-queue-drops-delta <COUNT>u64(none)fail if target-side queue drops increase past this
--fail-worker-disconnect-drops-delta <COUNT>u64(none)fail if worker-disconnect drops increase past this
<BASELINE>positional String(required)baseline archive directory, .stax package, v2 manifest, or legacy v1 archive.json
<CANDIDATE>positional String(required)candidate archive directory, .stax package, v2 manifest, or legacy v1 archive.json

The comparison reads each archive directly and prints deltas for PET samples, on/off-CPU interval time, target time, target span counts, origin-link counts, ingest drops, and the top target lanes by duration. V2 inputs use the same event-replay preference as stax open. --json emits the same comparison as named baseline/candidate/delta fields for CI and benchmark notes. Threshold flags fail the command when a positive candidate delta exceeds the limit; those failures are also reported as threshold_failures in JSON.

stax top

Snapshot the top-N functions or target-span names of the current run. See Inspecting a Run.

text
stax top [OPTIONS]
flagtypedefaultmeaning
--run <RUN_ID>u64(none)query a run without changing selected query state
-n, --limit <N>u3220maximum number of entries to return
--sort <MODE>Stringselfself (leaf-only) or total (any frame)
--tid <TID>u32(none)restrict to one thread; default is all threads

Output columns are active time, target-executor time, PET samples, target span count, and function/span name. For a synthetic target lane, --tid <TID> shows per-span durations in target ms and span counts in spans. For a target-only ranking that aggregates across origins, use stax target top. When target spans carry origins, filtering to the origin CPU tid also includes the matching target lane work as provenance-linked parallel work; it does not turn GPU/accelerator time into CPU execution under the dispatch stack. If Metal command/dispatch frames are visible but no target lane is present, stax top prints a stderr hint about explicit stax-target / Lane::metal Metal 4 timestamp cooperation. If the view is empty but the run has off-CPU/thread activity or target lanes outside a --tid filter, top prints a discovery hint for stax threads -n 0, target-lane tids, or stax-target integration.

stax flame

Print the active flamegraph as an indented tree. See Inspecting a Run.

text
stax flame [OPTIONS]
flagtypedefaultmeaning
--run <RUN_ID>u64(none)query a run without changing selected query state
-d, --max-depth <N>usize12stop printing below depth N; cut subtrees collapse to a summary
--threshold-pct <PCT>f641.0hide subtrees below this percent of total active time; 0 for all
--tid <TID>u32(none)restrict to one thread; default is all threads

Cooperating target lanes render as (all) -> lane -> span name, with per-node active time, target time, span count, and percent columns. When target spans carry origins, --tid <cpu tid> keeps the lane tree and filters it to work linked to that CPU origin. Like top, flame prints a Metal cooperation hint when Metal command/dispatch frames are visible but no synthetic target lane has reported spans. Empty flame views also get the same threads / target-lane / stax-target discovery hints as top.

stax threads

Per-thread and synthetic-lane CPU/target/off-CPU breakdown for the current run, sorted by total activity. CPU thread rows include origin-linked target span duration queued from that thread as provenance-linked parallel work, and synthetic target lanes with spans are included even if they fall past the normal -n cutoff. The output includes a kind column: thread for real sampled threads and target for synthetic target lanes. See Inspecting a Run.

text
stax threads [OPTIONS]
flagtypedefaultmeaning
--run <RUN_ID>u64(none)query a run without changing selected query state
-n, --limit <N>u3220maximum threads to print; 0 prints every thread

stax target

Inspect cooperating target lanes and target span/shader names directly. These commands are the CLI discovery points for questions like "which GPU lane exists?", "which shader/span took the most time?", and "which shader/span ran most often?" They use the same target-span aggregate as the web target details panel, and they keep target work parallel instead of pretending it is CPU stack execution.

stax target lanes

text
stax target lanes [OPTIONS]
flagtypedefaultmeaning
--run <RUN_ID>u64(none)query a run without changing selected query state
-n, --limit <N>u3220maximum target lanes to print; 0 prints all

Output columns are exact target time, span count, lane kind, synthetic tid, and lane name. It is equivalent to the target-only subset of stax threads, sorted by target time.

stax target top

text
stax target top [OPTIONS]
flagtypedefaultmeaning
--run <RUN_ID>u64(none)query a run without changing selected query state
-n, --limit <N>u3220maximum span/shader rows to print; 0 prints all
--by <MODE>Stringtimetime, count, avg, or max
--tid <TID>u32(none)filter to a target lane tid or origin-linked CPU tid

Rows aggregate by lane + span/shader name across origin groups, so one kernel does not split into many rows merely because it was dispatched from several CPU stacks. Columns are total target time, invocation count, average duration, max duration, lane kind, lane name, and span/shader name.

stax annotate

Disassemble and annotate one function from the current run. See Inspecting a Run.

text
stax annotate <TARGET> [OPTIONS]
flag / argtypedefaultmeaning
<TARGET>positional String(required)hex address (0x10004ad60) or a substring of a demangled name
--run <RUN_ID>u64(none)query a run without changing selected query state
--tid <TID>u32(none)restrict to one thread; default is all threads

A name substring is matched case-insensitively against the run's top-256 leaf-self functions; the hottest match wins.

stax diagnose

Dump stax-server diagnostics, including target-span ingest counters (batches, recorded/dropped spans, lane totals, origin link/unlink counts, unlinked-origin reasons, PET origin-distance min/avg/max, typed target metadata record counts, and stax-target local queue drops). It also prints target-ingest hints for missing batches, invalid span durations, missing origins, origins that failed to link, metadata records that arrive without executable spans, missing source/shader pairing, counter definitions without samples, batches that arrived with no active run, batches from the wrong pid, and target-side queue-full / worker-disconnected drops. See Troubleshooting.

text
stax diagnose [OPTIONS]
flagtypedefaultmeaning
--run <RUN_ID>u64(none)query a run without changing selected query state

stax dump

Ask every running stax process (staxd, stax-server, stax) to write a SIGUSR1 telemetry/debug snapshot into unified logging. Takes no options. See Troubleshooting.