Skip to content

Providers

KosmoKrator supports a large LLM provider catalog through Prism, direct provider clients, and OpenAI-compatible custom providers. Providers are configured once and then reused by terminal sessions, headless runs, SDK runs, ACP clients, Telegram gateway sessions, and subagents.

Provider configuration can be performed interactively, through setup, through providers:*, or through the SDK config helpers. The full shell command reference is in CLI Reference.

Every built-in provider is ready to use after entering credentials. The live catalog is large and is sourced from the application provider registry, so use kosmo providers:list --json and kosmo providers:models <provider> --json as the source of truth. The table below highlights common providers and their setup shape.

Provider IDLabelAuth ModeNotes
anthropicAnthropicAPI KeyClaude family — Opus 4.5, Sonnet 4.5, Haiku 4.5
openaiOpenAIAPI KeyGPT-4o, GPT-4.1 family, o-series reasoning models
codexCodex (ChatGPT)OAuthBrowser/device login flow, uses your ChatGPT subscription
geminiGoogle GeminiAPI KeyGemini 2.5 Pro and Flash
deepseekDeepSeekAPI KeyDeepSeek V3 (chat), R1 (reasoning)
groqGroqAPI KeyUltra-fast inference on dedicated hardware
mistralMistralAPI KeyMistral Large, Codestral
xaixAIAPI KeyGrok 3, with reasoning support
openrouterOpenRouterAPI KeyMeta-router for 100+ models from multiple providers
perplexityPerplexityAPI KeyOnline search-augmented models
ollamaOllamaNoneLocal models, no remote credentials required
kimiKimi (Moonshot)API KeyLong-context Chinese/English models
kimi-codingKimi CodingAPI KeyCode-optimized Moonshot endpoint
mimoXiaomi MiMo Token PlanAPI KeyMiMo models via token-plan key (free tier available)
mimo-apiXiaomi MiMo APIAPI KeyMiMo pay-as-you-go API
minimaxMiniMaxAPI KeyMiniMax models
minimax-cnMiniMax CNAPI KeyMiniMax China-region endpoint
zZ.AIAPI KeyZ.AI coding endpoint
z-apiZ.AI APIAPI KeyZ.AI standard API endpoint
stepfunStepFunAPI KeyStep models
stepfun-planStepFun PlanAPI KeyStep Plan subscription endpoint with reasoning support

The easiest way to configure credentials is the interactive setup command, which walks you through provider selection and API key entry:

Terminal window
kosmo setup

The same setup can run headlessly. Use this form for containers, CI, and remote machines:

Terminal window
# Configure an API-key provider and default model without exposing the key in argv
printf %s "$OPENAI_API_KEY" | \
kosmo setup --provider openai --model gpt-5.4-mini \
--api-key-stdin --global --json
# Equivalent provider-specific command
printf %s "$OPENAI_API_KEY" | \
kosmo providers:configure openai --model gpt-5.4-mini \
--api-key-stdin --global --json
# OAuth providers can use device login when available
kosmo providers:configure codex --device --global --json

Provider commands are designed for agents and scripts: they expose stable JSON, never print raw secrets, and include enough metadata to choose valid next commands.

Terminal window
# List providers, auth mode, source, and configured status
kosmo providers:list --json
# Show one provider's status
kosmo providers:status openai --json
# List advertised models for a provider
kosmo providers:models openai --json
# Refresh the cached inventory directly from the provider API
kosmo providers:models openai --live --json
kosmo providers:refresh-models openai --json
# Diagnose auth, endpoint, catalog freshness, and a specific model ID
kosmo providers:doctor openai --model gpt-5.4-mini --json
# Clear a stored API key
kosmo providers:logout openai --json

Provider commands reject unknown provider IDs with success: false and a non-zero exit code. This keeps automation from mistaking an empty result for a valid provider.

The normal provider catalog is available offline from the bundled Prism Relay registry. When you want the newest provider model list, run a live refresh. KosmoKrator stores the result in SQLite and overlays it on top of bundled metadata, preserving known context windows and pricing where provider APIs only return model IDs.

Live refresh is explicit so ordinary kosmo startup does not block on provider APIs. JSON output includes model_source, model_fetched_at, model_inventory_fresh, and any model_inventory_error.

If a provider launched a model before the catalog knows about it, headless setup can still pin it intentionally:

Terminal window
kosmo providers:configure openai \
--model future-model-id \
--allow-unlisted-model \
--global --json
kosmo settings:set agent.default_model future-model-id \
--provider openai \
--allow-unlisted-model \
--global --json

Custom OpenAI-compatible providers accept free-text model identifiers by default, unless their relay definition sets strict_models: true.

API keys entered through the setup wizard, providers:configure, or secrets:set are stored in the local SQLite database at ~/.kosmo/data/kosmo.db. Keys are never written to plain-text config files and JSON output only reports masked/configured status.

Terminal window
# Set a provider key without putting it in argv history
printf %s "$OPENAI_API_KEY" | \
kosmo secrets:set provider.openai.api_key --stdin --json
# Check managed secret status
kosmo secrets:status provider.openai.api_key --json
kosmo secrets:list --json
kosmo secrets:unset provider.openai.api_key --json

Alternatively, you can set provider API keys via environment variables. These are read from your Prism PHP configuration and take effect if no key is stored in the database. Common variables:

  • ANTHROPIC_API_KEY — Anthropic
  • OPENAI_API_KEY — OpenAI
  • DEEPSEEK_API_KEY — DeepSeek
  • GROQ_API_KEY — Groq
  • MISTRAL_API_KEY — Mistral
  • XAI_API_KEY — xAI
  • OPENROUTER_API_KEY — OpenRouter
  • PERPLEXITY_API_KEY — Perplexity
  • GEMINI_API_KEY — Google Gemini
  • KIMI_API_KEY — Kimi / Kimi Coding
  • MIMO_API_KEY — MiMo (token plan)
  • MIMO_PAYG_API_KEY — MiMo (pay-as-you-go API)
  • MINIMAX_API_KEY — MiniMax
  • MINIMAX_CN_API_KEY — MiniMax CN (China region)
  • STEPFUN_API_KEY — StepFun / StepFun Plan
  • ZAI_API_KEY — Z.AI / Z.AI API

Database-stored keys always take priority over environment variables. If you set a key via /settings and also have an environment variable, the stored key is used.

The codex provider uses a browser-based OAuth device login flow tied to your ChatGPT subscription. When you select Codex as your provider:

  1. KosmoKrator starts a local callback server on port 9876 (configurable in config/kosmo.yaml).
  2. Your browser opens to a ChatGPT authorization page.
  3. After granting access, the OAuth tokens are stored and refreshed automatically.

Token status is shown in the settings UI — including the associated email, expiration state, and whether a refresh is due.

You can change the active provider and model at any time during a session:

  1. Open the settings panel with the /settings command.
  2. Navigate to the Agent category.
  3. Change default_provider to the desired provider ID.
  4. Change default_model to a model supported by that provider.

Both settings have applies_now effect — the change takes effect on the very next LLM call without restarting the session.

The model selector is filtered by the currently selected provider. Change the provider first, then pick from its available models.

KosmoKrator supports running different models at different agent depths. This lets you use a powerful (and more expensive) model for the main agent while routing subagents to faster or cheaper models.

DepthRoleSettingsFallback
0Main agentdefault_provider / default_model
1Subagentssubagent_provider / subagent_modelInherits from depth 0
2+Sub-subagentssubagent_depth2_provider / subagent_depth2_modelInherits from depth 1, then depth 0

The resolution cascade works as follows: depth-2+ overrides fall back to depth-1 overrides, which fall back to the main agent defaults. Leave a setting empty to inherit from the parent depth.

Terminal window
# Main agent — most capable model
default_provider: anthropic
default_model: claude-opus-4-5-20250929
# Subagents — fast and affordable
subagent_provider: anthropic
subagent_model: claude-haiku-4-5-20251001
# Sub-subagents — inherit from subagent settings
# (leave subagent_depth2_provider and subagent_depth2_model empty)

Per-depth overrides are configured under the Subagents category in /settings. Each setting applies immediately when changed.

Any OpenAI-compatible API endpoint can be added as a custom provider. This is useful for self-hosted models, corporate proxies, or providers not yet included in the built-in catalog.

  1. Open /settings and navigate to Provider Setup.
  2. Add a new provider with a unique ID.
  3. Configure the required fields:

Or create/update the provider headlessly:

Terminal window
printf %s "$CORP_LLM_API_KEY" | kosmo providers:custom:upsert corp_llm \
--label "Corporate LLM" \
--url https://llm.corp.example/v1 \
--model llama-3.1-70b \
--context 128000 \
--max-output 8192 \
--api-key-stdin \
--global --json
kosmo providers:custom:list --json
kosmo providers:custom:delete corp_llm --json

For richer definitions, pass JSON on stdin. The payload may include id, scope, api_key, and a definition object with the same fields used in YAML (label, driver, auth, url, default_model, modalities, and models).

FieldDescriptionExample
labelHuman-readable name shown in the UIMy Corporate LLM
base_urlFull URL to the chat completions endpointhttps://llm.corp.example/v1
api_keyAPI key for authenticationsk-corp-...
default_modelModel identifier to use by defaultllama-3.1-70b

Custom providers use the relay system for request/response normalization, so they work with tool calling, streaming, and all other agent features as long as the endpoint implements the OpenAI chat completions format.

Some providers support extended thinking / reasoning modes, where the model performs chain-of-thought reasoning before producing its final answer. KosmoKrator controls this via the reasoning_effort setting (under the Agent category in /settings).

ProviderReasoning BehaviorEffort Levels
openaiControllable via reasoning_effort for o-series models (o1, o3, o4-mini)low / medium / high
xaiControllable via reasoning_effort for Grok 3 Think modelslow / medium / high
deepseekAlways-on reasoning for R1 modelsNot configurable
stepfun, stepfun-planAlways-on reasoningNot configurable
kimi, kimi-codingAlways-on reasoningNot configurable
groqAlways-on reasoningNot configurable
mistralAlways-on reasoningNot configurable
perplexityAlways-on reasoningNot configurable
openrouterAlways-on reasoningNot configurable
z, z-apiAlways-on reasoningNot configurable
minimax, minimax-cnAlways-on reasoningNot configurable
mimo, mimo-apiAlways-on reasoningNot configurable
All othersNo reasoning supportSetting is safely ignored

Anthropic supports extended thinking (chain-of-thought) via Prism’s native driver, but this is not controlled through the reasoning_effort parameter. It is handled internally by the driver when supported models are used.

The available effort levels are off, low, medium, and high. Setting the value to off disables reasoning parameters entirely, even for providers that support it.

Reasoning models tend to produce longer, more thorough responses but use significantly more tokens. Use low or medium for routine tasks and reserve high for complex multi-step problems.

Under the hood, KosmoKrator uses two client implementations to communicate with LLM providers. The correct client is selected automatically based on the provider.

The primary client for most providers. Built on Amp HTTP, it sends raw HTTP requests to OpenAI-compatible chat completions endpoints with full async streaming support. Used for:

  • OpenAI, DeepSeek, Groq, Mistral, xAI, OpenRouter, Perplexity
  • Ollama, Kimi, Kimi Coding, MiMo, MiMo API, Z.AI, Z.AI API, StepFun, StepFun Plan
  • All custom providers (OpenAI-compatible endpoints)

A synchronous client backed by the Prism PHP SDK. Used for providers that have native Prism drivers with specialized request/response handling:

  • Anthropic (Claude) — uses Prism’s native Anthropic driver with prompt caching
  • Google Gemini — uses Prism’s native Gemini driver
  • MiniMax, MiniMax CN — uses Prism’s Anthropic-compatible driver (Anthropic-format endpoints)

A decorator that wraps either client, adding automatic retry logic with exponential backoff and jitter. Retries are triggered on:

  • Rate limits (HTTP 429) — honors Retry-After headers from the provider
  • Server errors (HTTP 5xx) — transient provider outages
  • Network failures — connection timeouts, DNS resolution errors

The maximum number of retry attempts is configurable via the max_retries setting. A value of 0 means unlimited retries (the agent keeps trying until the provider responds successfully).