@arizeai/phoenix-cli

phoenix/js/packages/phoenix-cli at main · Arize-ai/phoenix

GitHub

Phoenix CLI is a command-line interface for your Phoenix projects. Fetch traces, list datasets, export experiment results, and access prompts directly from your terminal—or pipe them into AI coding agents like Claude Code, Cursor, Codex, and Gemini CLI. You can use Phoenix CLI for:

Immediate Debugging: Fetch the most recent trace of a failed or unexpected run with a single command
Bulk Export: Export large numbers of traces or experiment results to JSON files for offline analysis
Dataset & Experiment Access: List datasets and retrieve full experiment data including runs, evaluations, and trace IDs
Prompt Introspection: View and export prompt templates for analysis, optimization, or use with other tools
Terminal Workflows: Integrate trace and experiment data into your existing tools, piping output to Unix utilities like jq
AI Coding Assistants: Use with Claude Code, Cursor, Windsurf, or other AI-powered tools to analyze traces, experiments, and optimize prompts

Don’t see a use-case covered? @arizeai/phoenix-cli is open-source! Issues and PRs welcome.

Installation

npm install -g @arizeai/phoenix-cli

Or run directly with npx:

npx @arizeai/phoenix-cli

Quick Start

# Configure your Phoenix instance
export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key  # if authentication is enabled

# Fetch the most recent trace
px trace list --limit 1

# Fetch a specific trace by ID
px trace get abc123def456

# Fetch recent LLM spans
px span list --span-kind LLM --limit 10

# Export traces to a directory
px trace list ./my-traces --limit 50

Environment Variables

Variable	Description
`PHOENIX_HOST`	Phoenix API endpoint (e.g., `http://localhost:6006`)
`PHOENIX_PROJECT`	Project name or ID
`PHOENIX_API_KEY`	API key for authentication (if required)
`PHOENIX_CLIENT_HEADERS`	Custom headers as JSON string

CLI flags take priority over environment variables.

Commands

`px project list`

List all available projects.

px project list
px project list --format raw  # JSON output for piping

Option	Description	Default
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	Output format: `pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—
`--limit <number>`	Maximum projects to fetch per page	100

`px trace list [directory]`

Fetch recent traces from the configured project.

px trace list --limit 10                          # Output to stdout
px trace list ./my-traces --limit 10              # Save to directory
px trace list --last-n-minutes 60 --limit 20      # Filter by time
px trace list --since 2026-01-13T10:00:00Z        # Since timestamp
px trace list --format raw --no-progress | jq     # Pipe to jq

Option	Description	Default
`[directory]`	Save traces as JSON files to directory	stdout
`-n, --limit <number>`	Number of traces to fetch (newest first)	10
`--last-n-minutes <number>`	Only fetch traces from the last N minutes	—
`--since <timestamp>`	Fetch traces since ISO timestamp	—
`--endpoint <url>`	Phoenix API endpoint	From env
`--project <name>`	Project name or ID	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress output	—
`--max-concurrent <number>`	Maximum concurrent fetches	10

`px trace get <trace-id>`

Fetch a specific trace by ID.

px trace get abc123def456
px trace get abc123def456 --file trace.json      # Save to file
px trace get abc123def456 --format raw | jq      # Pipe to jq

Option	Description	Default
`--file <path>`	Save to file instead of stdout	stdout
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--endpoint <url>`	Phoenix API endpoint	From env
`--project <name>`	Project name or ID	From env
`--api-key <key>`	Phoenix API key	From env
`--no-progress`	Disable progress indicators	—

`px span list [file]`

Fetch individual spans from the configured project with comprehensive filtering.

px span list --limit 20                                    # Recent spans (table view)
px span list --last-n-minutes 60 --limit 50                # Spans from last hour
px span list --span-kind LLM --limit 10                    # Only LLM spans
px span list --status-code ERROR --limit 20                # Only errored spans
px span list --span-kind LLM TOOL --status-code OK         # Combine filters
px span list --name chat_completion --limit 10             # Filter by span name
px span list --trace-id abc123 --format raw | jq           # All spans for a trace
px span list --attribute llm.model_name:gpt-4              # Filter by attribute
px span list --attribute llm.model_name:gpt-4 --attribute user.role:admin  # AND filters
px span list --include-annotations --limit 10              # Include annotation scores
px span list output.json --limit 100                       # Save to JSON file
px span list --format raw --no-progress | jq               # Pipe to jq

Option	Description	Default
`[file]`	Save spans as JSON to file	stdout
`-n, --limit <number>`	Maximum spans to fetch (newest first)	100
`--last-n-minutes <number>`	Only fetch spans from the last N minutes	—
`--since <timestamp>`	Fetch spans since ISO timestamp	—
`--span-kind <kinds...>`	Filter by span kind (`LLM`, `CHAIN`, `TOOL`, `RETRIEVER`, `EMBEDDING`, `AGENT`, `RERANKER`, `GUARDRAIL`, `EVALUATOR`, `UNKNOWN`)	—
`--status-code <codes...>`	Filter by status code (`OK`, `ERROR`, `UNSET`)	—
`--name <names...>`	Filter by span name(s)	—
`--trace-id <ids...>`	Filter by trace ID(s)	—
`--parent-id <id>`	Filter by parent span ID (use `"null"` for root spans)	—
`--attribute <filters...>`	Filter by attribute key-value pairs. Format: `key:value`. Repeat to AND multiple filters. Values containing colons are supported (split on first `:` only). To match a string attribute that looks like a number or boolean, JSON-quote the value (e.g., `'user.id:"12345"'`). Requires Phoenix server ≥ 14.9.0.	—
`--include-annotations`	Include span annotations in output	Off
`--endpoint <url>`	Phoenix API endpoint	From env
`--project <name>`	Project name or ID	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—

`px span add-note <span-id>`

Notes are a reserved annotation type. Unlike other annotations, notes are open-ended and multiple notes can be attached to the same span.

px span add-note abc123def456 --text "Escalated: retrieval returned empty results"
px span add-note abc123def456 --text "Reviewed by on-call" --format json

Option	Description	Default
`<span-id>`	OpenTelemetry span ID	—
`--text <text>`	Note text to attach to the span	Required
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—

`px session list`

List sessions (multi-turn conversations) for a project.

px session list                                       # List recent sessions
px session list --limit 20                            # More sessions
px session list --order asc                           # Oldest first
px session list --format raw --no-progress | jq       # Pipe to jq

Option	Description	Default
`-n, --limit <number>`	Maximum number of sessions to return	10
`--order <order>`	Sort order: `asc` or `desc`	`desc`
`--endpoint <url>`	Phoenix API endpoint	From env
`--project <name>`	Project name or ID	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—

`px session get <session-id>`

View a session’s conversation flow, including all traces (turns) in the session.

px session get my-chat-session-001                              # By session_id
px session get UHJvamVjdFNlc3Npb24...                           # By GlobalID
px session get my-chat-session-001 --include-annotations        # With annotations
px session get my-chat-session-001 --file session.json          # Save to file
px session get my-chat-session-001 --format raw | jq            # Pipe to jq

Option	Description	Default
`--include-annotations`	Include session annotations	Off
`--file <path>`	Save to file instead of stdout	stdout
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--endpoint <url>`	Phoenix API endpoint	From env
`--project <name>`	Project name or ID	From env
`--api-key <key>`	Phoenix API key	From env
`--no-progress`	Disable progress indicators	—

`px dataset list`

List all available datasets.

px dataset list
px dataset list --format json                    # JSON output
px dataset list --format raw --no-progress | jq  # Pipe to jq

Option	Description	Default
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—
`--limit <number>`	Maximum number of datasets	—

`px dataset get <dataset-identifier>`

Fetch examples from a dataset.

px dataset get query_response                        # Fetch all examples
px dataset get query_response --split train          # Filter by split
px dataset get query_response --split train --split test  # Multiple splits
px dataset get query_response --version <version-id> # Specific version
px dataset get query_response --file dataset.json    # Save to file
px dataset get query_response --format raw | jq '.examples[].input'

Option	Description	Default
`--split <name>`	Filter by split (can be used repeatedly)	—
`--version <id>`	Fetch from specific dataset version	latest
`--file <path>`	Save to file instead of stdout	stdout
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--no-progress`	Disable progress indicators	—

`px experiment list --dataset <name-or-id>`

List experiments for a dataset, optionally exporting full data to files.

px experiment list --dataset my-dataset                 # List experiments
px experiment list --dataset my-dataset --format json   # JSON output
px experiment list --dataset my-dataset ./experiments   # Export to directory

Option	Description	Default
`--dataset <name-or-id>`	Dataset name or ID (required)	—
`[directory]`	Export experiment JSON files to directory	stdout
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—
`--limit <number>`	Maximum number of experiments	—

`px experiment get <experiment-id>`

Fetch a single experiment with all run data, including inputs, outputs, evaluations, and trace IDs.

px experiment get RXhwZXJpbWVudDox
px experiment get RXhwZXJpbWVudDox --file exp.json   # Save to file
px experiment get RXhwZXJpbWVudDox --format json     # JSON output

Option	Description	Default
`--file <path>`	Save to file instead of stdout	stdout
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--no-progress`	Disable progress indicators	—

`px prompt list`

List all available prompts.

px prompt list
px prompt list --format json                    # JSON output
px prompt list --format raw --no-progress | jq  # Pipe to jq

Option	Description	Default
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--format <format>`	`pretty`, `json`, or `raw`	`pretty`
`--no-progress`	Disable progress indicators	—
`--limit <number>`	Maximum number of prompts	—

`px prompt get <prompt_identifier>`

Show a Phoenix prompt. Supports multiple output formats including a text format optimized for piping to AI coding assistants.

px prompt get my-assistant-prompt                    # Latest version (pretty)
px prompt get my-assistant-prompt --tag production   # Get by tag
px prompt get my-assistant-prompt --version abc123   # Specific version
px prompt get my-assistant-prompt --format json      # JSON output
px prompt get my-assistant-prompt --format text      # Plain text for piping

Option	Description	Default
`--tag <name>`	Get prompt version by tag name	—
`--version <id>`	Get specific prompt version by ID	latest
`--format <format>`	`pretty`, `json`, `raw`, or `text`	`pretty`
`--endpoint <url>`	Phoenix API endpoint	From env
`--api-key <key>`	Phoenix API key	From env
`--no-progress`	Disable progress indicators	—

The text format outputs prompt content with XML-style role tags, ideal for piping to AI assistants:

<system>You are a helpful assistant specialized in...</system>
<user>{{user_input}}</user>

`px api graphql <query>`

Make authenticated GraphQL queries against the Phoenix API. Output is {"data": {...}} JSON — pipe with jq '.data.<field>' to extract values. Only queries are permitted; mutations and subscriptions are rejected before hitting the server.

px api graphql '<query>' [--endpoint <url>] [--api-key <key>]

Argument/Option	Description	Default
`<query>`	GraphQL query string	—
`--endpoint <url>`	Phoenix API endpoint	`$PHOENIX_HOST`
`--api-key <key>`	Phoenix API key	`$PHOENIX_API_KEY`

Discover the schema with introspection

Use introspection to explore what fields and types are available without leaving your terminal:

$ px api graphql '{ __schema { queryType { fields { name } } } }' | \
    jq '.data.__schema.queryType.fields[].name'
"projects"
"datasets"
"prompts"
"evaluators"
"projectCount"
"datasetCount"
"promptCount"
"evaluatorCount"
"serverStatus"
"viewer"
...

$ px api graphql '{ __type(name: "Experiment") { fields { name type { name } } } }' | \
    jq '.data.__type.fields[] | {name, type: .type.name}'
{"name": "id", "type": "ID"}
{"name": "name", "type": "String"}
{"name": "runCount", "type": "Int"}
{"name": "errorRate", "type": "Float"}
{"name": "averageRunLatencyMs", "type": "Float"}

Projects

$ px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }'
{
  "data": {
    "projects": {
      "edges": [
        { "node": { "name": "default", "traceCount": 1482, "tokenCountTotal": 219083 } }
      ]
    }
  }
}

$ px api graphql '{ projects { edges { node { name traceCount } } } }' | \
    jq '.data.projects.edges[].node'
{"name": "default", "traceCount": 1482}

Available fields: id, name, traceCount, recordCount, tokenCountTotal, tokenCountPrompt, tokenCountCompletion, createdAt, updatedAt.

Datasets

$ px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | \
    jq '.data.datasets.edges[].node'
{"name": "eval-golden-set", "exampleCount": 120, "experimentCount": 4}
{"name": "rag-test-cases", "exampleCount": 50, "experimentCount": 1}

$ px api graphql '{ datasetCount }' | jq '.data.datasetCount'
12

Available fields: id, name, description, exampleCount, experimentCount, evaluatorCount, createdAt, updatedAt.

Experiments

Experiments are nested under datasets in the GraphQL schema:

$ px api graphql '{
  datasets {
    edges {
      node {
        name
        experiments {
          edges {
            node { name runCount errorRate averageRunLatencyMs }
          }
        }
      }
    }
  }
}' | jq '.data.datasets.edges[].node | {dataset: .name, experiments: [.experiments.edges[].node]}'

# Find experiments with non-zero error rate
$ px api graphql '{
  datasets { edges { node { name experiments { edges { node { name errorRate runCount } } } } } }
}' | jq '.. | objects | select(.errorRate? > 0)'

To inspect individual run outputs, errors, and trace IDs:

$ px api graphql '{
  datasets(first: 1) {
    edges { node { experiments(first: 1) { edges { node {
      name
      runs { edges { node { traceId output error latencyMs } } }
    } } } } }
  }
}' | jq '.data.datasets.edges[0].node.experiments.edges[0].node.runs.edges[].node'
{"traceId": "b696d0ac...", "output": {"answer": "Moore's Law is..."}, "error": null, "latencyMs": 1006}

Available run fields: traceId, output, error, latencyMs, startTime, endTime.

Evaluators

$ px api graphql '{ evaluators { edges { node { name kind description isBuiltin } } } }' | \
    jq '.data.evaluators.edges[].node'
{"name": "correctness", "kind": "LLM", "description": "Evaluates answer correctness", "isBuiltin": true}

Instance summary

$ px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
{
  "data": {
    "projectCount": 1,
    "datasetCount": 12,
    "promptCount": 3,
    "evaluatorCount": 2
  }
}

## Output Formats

**`pretty`** (default) — Human-readable tree view:

┌─ Trace: abc123def456 │ │ Input: What is the weather in San Francisco? │ Output: The weather is currently sunny… │ │ Spans: │ └─ ✓ agent_run (CHAIN) - 1250ms │ ├─ ✓ llm_call (LLM) - 800ms │ └─ ✓ tool_execution (TOOL) - 400ms └─

**`json`** — Formatted JSON with indentation.

**`raw`** — Compact JSON for piping to `jq` or other tools.

## JSON Structure

```json
{
  "traceId": "abc123def456",
  "spans": [
    {
      "name": "chat_completion",
      "context": {
        "trace_id": "abc123def456",
        "span_id": "span-1"
      },
      "span_kind": "LLM",
      "parent_id": null,
      "start_time": "2026-01-17T10:00:00.000Z",
      "end_time": "2026-01-17T10:00:01.250Z",
      "status_code": "OK",
      "attributes": {
        "llm.model_name": "gpt-4",
        "llm.token_count.prompt": 512,
        "llm.token_count.completion": 256,
        "input.value": "What is the weather?",
        "output.value": "The weather is sunny..."
      }
    }
  ],
  "rootSpan": { ... },
  "startTime": "2026-01-17T10:00:00.000Z",
  "endTime": "2026-01-17T10:00:01.250Z",
  "duration": 1250,
  "status": "OK"
}

Spans include OpenInference semantic attributes like llm.model_name, llm.token_count.*, input.value, output.value, tool.name, and exception.*.

Examples

Debug failed traces

px trace list --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'

Find slowest traces

px trace list --limit 10 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:3]'

Find errored spans

px span list --status-code ERROR --limit 50 --format raw --no-progress | jq '.[] | {name, status_message}'

Inspect LLM spans with annotations

px span list --span-kind LLM --include-annotations --limit 20

Extract LLM models used

px trace list --limit 50 --format raw --no-progress | \
  jq -r '.[].spans[] | select(.span_kind == "LLM") | .attributes["llm.model_name"]' | sort -u

Count errors

px trace list --limit 100 --format raw --no-progress | jq '[.[] | select(.status == "ERROR")] | length'

List datasets and experiments

# List all datasets
px dataset list --format raw --no-progress | jq '.[].name'
# Output: "query_response"

# List experiments for a dataset
px experiment list --dataset query_response --format raw --no-progress | \
  jq '.[] | {id, successful_run_count, failed_run_count}'
# Output: {"id":"RXhwZXJpbWVudDox","successful_run_count":249,"failed_run_count":1}

# Export all experiment data for a dataset to a directory
px experiment list --dataset query_response ./experiments/

Analyze experiment results

# Get input queries and latency from an experiment
px experiment get RXhwZXJpbWVudDox --format raw --no-progress | \
  jq '.[] | {query: .input.query, latency_ms, trace_id}'

# Find failed runs in an experiment
px experiment get RXhwZXJpbWVudDox --format raw --no-progress | \
  jq '.[] | select(.error != null) | {query: .input.query, error}'
# Output: {"query":"looking for complex fodmap meal ideas","error":"peer closed connection..."}

# Calculate average latency across runs
px experiment get RXhwZXJpbWVudDox --format raw --no-progress | \
  jq '[.[].latency_ms] | add / length'

Work with prompts

# List all prompts
px prompt list --format raw --no-progress | jq '.[].name'

# Get prompt template content
px prompt get my-evaluator --format text --no-progress

# View prompt with all metadata
px prompt get my-evaluator --format json --no-progress | jq '.template'

# Get a specific tagged version
px prompt get my-evaluator --tag production --format text --no-progress

Query the GraphQL API directly

# Quick instance summary
$ px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
{"data": {"projectCount": 1, "datasetCount": 12, "promptCount": 3, "evaluatorCount": 2}}

# Discover all available query fields
$ px api graphql '{ __schema { queryType { fields { name } } } }' | \
    jq '.data.__schema.queryType.fields[].name'

# Projects with stats
$ px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | \
    jq '.data.projects.edges[].node'

# Datasets with counts
$ px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | \
    jq '.data.datasets.edges[].node'

# Find experiments with errors
$ px api graphql '{
  datasets { edges { node { name experiments { edges { node { name errorRate runCount } } } } } }
}' | jq '.. | objects | select(.errorRate? > 0)'

# Drill into run outputs
$ px api graphql '{
  datasets(first: 1) { edges { node {
    experiments(first: 1) { edges { node {
      runs { edges { node { traceId output error latencyMs } } }
    } } }
  } } }
}' | jq '.data.datasets.edges[0].node.experiments.edges[0].node.runs.edges[].node'

# Get viewer info (authenticated instances)
$ px api graphql '{ viewer { username email } }'

Use with AI Coding Assistants

Phoenix CLI is designed to work seamlessly with AI coding assistants like Claude Code, Cursor, and Windsurf.

Claude Code

Ask Claude Code:

Use px to fetch the last 3 traces from my Phoenix project and analyze them for potential improvements

Claude Code will discover the CLI via px --help and fetch your traces for analysis.

Prompt Optimization with Claude Code

Pipe your Phoenix prompts directly to Claude Code for analysis and optimization suggestions:

# Get prompt optimization ideas
px prompt get my-evaluator --format text --no-progress | claude -p "Review this prompt and suggest improvements for clarity and effectiveness"

# Analyze prompt for edge cases
px prompt get my-assistant --format text --no-progress | claude -p "What edge cases might this prompt fail to handle?"

# Generate test cases for a prompt
px prompt get my-classifier --format text --no-progress | claude -p "Generate 5 diverse test inputs to evaluate this prompt"

You can also ask Claude Code to work with your prompts interactively:

Fetch my "correctness-evaluator" prompt from Phoenix and suggest how to make the rubric more specific

Cursor / Windsurf

Run the CLI in the terminal and ask the AI to interpret:

Fetch my recent Phoenix traces using px and explain what my agent is doing

For prompt work:

List my Phoenix prompts with px and help me improve the system prompt for my assistant

Retrieve Traces via CLI

User guide for fetching traces from the command line