NBI’s OpenAI-compatible provider speaks the Chat Completions API. Anything that exposes that API works — Anthropic via the direct API, vLLM, LiteLLM, TGI, llama.cpp’s HTTP server, hosted services, your own gateway.

What you get

  • Bring your own endpoint. Provide a base URL, an API key, and a model ID. NBI handles the rest.
  • Tool calling and inline completion. Tool calling works wherever the upstream server supports it (vLLM with --enable-auto-tool-choice, LiteLLM with native or translated tool calls, etc.).
  • Per-pod overrides. Useful for centralized gateways: lock the base URL at the JupyterHub level with NBI_OPENAI_BASE_URL and let users only pick the model.

Configuration

{
  "chat_model": {
    "provider": "openai-compatible",
    "model": "gpt-oss-120b",
    "base_url": "https://your-gateway.internal/v1",
    "api_key": "sk-..."
  }
}

Reference

Install NBI