Documentation Index
Fetch the complete documentation index at: https://arizeai-433a7140-mikeldking-12899-providers-and-secrets.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
LLM Evaluators require an LLM in order to score an evaluation input. Phoenix evals are provider agnostic and work with virtually any foundation model.
Python Configuration
The Phoenix evals Python package uses an adapter pattern to wrap underlying client SDKs and provide a unified interface. Each adapter forwards parameters directly to the underlying client, so you can use the same configuration options as the native SDK.
- Client configuration parameters (e.g.,
api_key, base_url, api_version) are passed as **kwargs when creating the LLM instance. These configure the client itself.
- Model invocation parameters (e.g.,
temperature, max_tokens, top_p) are passed as **kwargs when creating an evaluator. These control how the model generates responses.
Detailed information and examples for each adapter can be found in the sections below.
When creating an LLM, specify:
provider: The provider name (e.g., "openai", "azure", "anthropic")
model: The model identifier
client (optional): Which client SDK to use if multiple are installed (e.g., "openai", "langchain", "litellm")
sync_client_kwargs (optional): Client configuration forwarded only to the sync client
async_client_kwargs (optional): Client configuration forwarded only to the async client
**kwargs: Client configuration parameters forwarded to both sync and async client constructors.
To see the currently supported LLM providers and their availability, use the show_provider_availability function:
from phoenix.evals.llm import show_provider_availability
show_provider_availability()
The output shows which providers are available based on installed dependencies, and which client SDKs can be used for each provider:
📦 AVAILABLE PROVIDERS (sorted by client priority)
--------------------------------------------------------------------
Provider | Status | Client | Dependencies
--------------------------------------------------------------------
azure | ✓ Available | openai | openai
openai | ✓ Available | openai | openai
openai | ✓ Available | langchain | langchain, langchain-openai
openai | ✓ Available | litellm | litellm
anthropic | ✓ Available | anthropic | anthropic
anthropic | ✓ Available | langchain | langchain, langchain-anthropic
anthropic | ✓ Available | litellm | litellm
google | ✓ Available | google-genai | google-genai
litellm | ✓ Available | litellm | litellm
bedrock | ✓ Available | litellm | litellm, boto3
vertex | ✓ Available | litellm | litellm
The provider column shows the supported providers, and the status column will read “Available” if the required dependencies are installed in the active Python environment. Note that multiple client SDKs can be used to make LLM requests to a provider; the desired client SDK can be specified when constructing the LLM wrapper client.
OpenAI Adapter
Client: openai.OpenAI() or openai.AsyncOpenAI()
Invocation: client.chat.completions.create()
Docs: OpenAI Python Client
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
# Client config → LLM creation
llm = LLM(
provider="openai",
model="gpt-4o",
client="openai",
api_key="your-api-key", # Client config param
timeout=30.0, # Client config param
)
# Invocation params → Evaluator creation
evaluator = ClassificationEvaluator(
name="example",
prompt_template="Classify: {input}",
choices={"positive": 1, "negative": 0},
llm=llm,
temperature=0.0, # Invocation param
max_tokens=100, # Invocation param
)
Azure OpenAI Adapter
Client: openai.AzureOpenAI() or openai.AsyncAzureOpenAI()
Invocation: client.chat.completions.create()
Docs: Azure OpenAI Python SDK
Note: The model parameter should be your Azure deployment name.
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
llm = LLM(
provider="azure",
model="gpt-4o-deployment", # Azure deployment name
api_key="your-azure-api-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com",
)
evaluator = ClassificationEvaluator(
name="example",
prompt_template="Classify: {input}",
choices={"positive": 1, "negative": 0},
llm=llm,
temperature=0.0,
max_tokens=100,
)
LiteLLM Adapter
Client: Lightweight wrapper (no traditional client object)
Invocation: litellm.completion() or litellm.acompletion()
Docs: LiteLLM Documentation
Note: Model names must use provider route format: {provider}/{model} (e.g., "x-ai/grok-2").
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
import os
os.environ["XAI_API_KEY"] = "your-xai-api-key"
llm = LLM(
provider="litellm",
model="x-ai/grok-2", # Provider route format
client="litellm",
)
evaluator = ClassificationEvaluator(
name="example",
prompt_template="Classify: {input}",
choices={"positive": 1, "negative": 0},
llm=llm,
temperature=0.0,
max_tokens=100,
)
LangChain Adapter
Client: LangChain chat model classes (e.g., langchain_openai.ChatOpenAI, langchain_anthropic.ChatAnthropic)
Invocation: client.invoke() or client.predict()
Docs: LangChain OpenAI, LangChain Anthropic
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
llm = LLM(
provider="openai",
model="gpt-4o",
client="langchain",
api_key="your-api-key",
)
evaluator = ClassificationEvaluator(
name="example",
prompt_template="Classify: {input}",
choices={"positive": 1, "negative": 0},
llm=llm,
temperature=0.0,
max_tokens=100,
)
Anthropic Adapter
Client: anthropic.Anthropic() or anthropic.AsyncAnthropic()
Invocation: client.messages.create()
Docs: Anthropic Python SDK
Note: max_tokens is required and defaults to 4096 if not specified when creating the evaluator.
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
llm = LLM(
provider="anthropic",
model="claude-3-5-sonnet-20241022",
api_key="your-anthropic-api-key",
timeout=30.0,
)
evaluator = ClassificationEvaluator(
name="example",
prompt_template="Classify: {input}",
choices={"positive": 1, "negative": 0},
llm=llm,
temperature=0.0,
max_tokens=1024,
)
Google GenAI Adapter
Client: google.genai.Client()
Invocation: client.models.generate_content()
Docs: Google GenAI Python SDK
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
llm = LLM(
provider="google",
model="gemini-2.0-flash-exp",
api_key="your-google-api-key", # or set env var
)
evaluator = ClassificationEvaluator(
name="example",
prompt_template="Classify: {input}",
choices={"positive": 1, "negative": 0},
llm=llm,
temperature=0.0,
)
Separate Sync/Async Client Configuration
Some providers (OpenAI, Anthropic) create separate sync and async SDK clients internally. The sync_client_kwargs and async_client_kwargs parameters allow passing configuration that applies only to one client type, useful for:
- Different timeouts: Longer timeouts for async batch operations
- Different HTTP clients: Custom httpx clients for sync vs async
- Different retry configurations: More aggressive retries for batch async calls
Example: Different Timeouts for Sync and Async Clients
from phoenix.evals.llm import LLM
llm = LLM(
provider="openai",
model="gpt-4o",
api_key="your-api-key",
sync_client_kwargs={"timeout": 30.0},
async_client_kwargs={"timeout": 120.0},
)
Example: Custom HTTP Clients
import httpx
from phoenix.evals.llm import LLM
llm = LLM(
provider="openai",
model="gpt-4o",
api_key="your-api-key",
sync_client_kwargs={"http_client": httpx.Client(timeout=30.0)},
async_client_kwargs={"http_client": httpx.AsyncClient(timeout=120.0)},
)
TypeScript Configuration
The TypeScript evaluation library uses the AI SDK’s LanguageModel type for model abstraction. Models are created using AI SDK provider functions and passed directly to evaluators.
Installation
# Install model provider(s) separately based on your needs
npm install @ai-sdk/openai # For OpenAI models
npm install @ai-sdk/anthropic # For Anthropic models
npm install @ai-sdk/google # For Google models
npm install @ai-sdk/azure # For Azure OpenAI models
Configuring Model Providers
Import and configure your model provider, then pass it to evaluators:
import { openai } from "@ai-sdk/openai";
import { anthropic } from "@ai-sdk/anthropic";
// OpenAI model
const openaiModel = openai("gpt-4o-mini");
// Anthropic model
const anthropicModel = anthropic("claude-sonnet-4-20250514");
The AI SDK handles authentication via environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) or you can pass configuration directly:
import { createOpenAI } from "@ai-sdk/openai";
import { createAzure } from "@ai-sdk/azure";
// OpenAI with custom configuration
const openai = createOpenAI({
apiKey: "my-openai-api-key",
baseURL: "https://custom-endpoint.com/v1",
});
const model = openai("gpt-4o-mini");
// Azure OpenAI
const azure = createAzure({
apiKey: "your-azure-api-key",
resourceName: "your-resource-name",
});
const azureModel = azure("your-deployment-name");
Using with LLM Evaluators
import { createClassificationEvaluator } from "@arizeai/phoenix-evals/llm";
import { openai } from "@ai-sdk/openai";
const model = openai("gpt-4o-mini");
// Create a classification evaluator
const evaluator = createClassificationEvaluator({
name: "factual_check",
model,
choices: { factual: 1, hallucinated: 0 },
promptTemplate: "Your evaluation prompt here: {input}",
});
Invocation Parameters
Model invocation parameters (like temperature, maxTokens, etc.) are passed through to the underlying AI SDK generateObject call. However, the current TypeScript type definitions don’t explicitly include these parameters in CreateClassifierArgs or CreateClassificationEvaluatorArgs, so TypeScript will show type errors if you try to pass them directly.
Note: Invocation parameters work at runtime (they are captured via the ...rest spread in createClassifierFn and passed through to generateObject), but TypeScript will show type errors at compile time. To use invocation parameters, you’ll need to use type assertions (as shown in the example below) since the AI SDK does not support setting default invocation parameters at the model level.
const evaluator = createClassificationEvaluator({
name: "factual_check",
model,
choices: { factual: 1, hallucinated: 0 },
promptTemplate: "Your evaluation prompt here: {input}",
temperature: 0.0,
maxTokens: 100,
} as any);
For more configuration options and provider-specific settings, refer to the AI SDK documentation.