API Documentation

OpenAI-compatible API, SDKs, Terraform provider, and serverless GPU functions. Drop-in replacement for your existing code.

Authentication

All API requests require authentication. HOW TO GET AN API KEY 1. Sign in at the dashboard with your Keycloak credentials. 2. Go to /api-keys and click "Generate Key". 3. For normal access (LLM API, reading your own resources) leave "Advanced options" collapsed and just enter a name. Keys default to read+write on your own account resources. 4. For operator scripts that call /api/v1/admin/* endpoints (test- account cleanup, orphan sweeper, maintenance tools), expand "Advanced options" and tick BOTH: - admin scope (requires your account to be is_admin=true) - write permission (default-on) 5. Copy the sk-… value IMMEDIATELY. It is shown once at creation and never retrievable again. Delete the key and create a new one if you miss it. 6. Revoke the key (same /api-keys page) when the task is done. AUTH TYPES - API key Programmatic access; LLM API + Dashboard API. Format: sk-ic-XXXXXXXX... - OIDC token Dashboard UI + SPA calls. Short-lived. - Console ticket One-time WebSocket ticket for VM serial console. SCOPES + PERMISSIONS The admin endpoints under /api/v1/admin/* additionally require: scopes: includes "admin" (or wildcard "*") permissions: includes "write" (or wildcard "*") A non-admin account cannot successfully use an "admin"-scoped key.

# LLM API — use your API key
curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json"

# Dashboard API — use your OIDC token
curl /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Admin-scoped API key (for operator scripts)
# Create at /api-keys with Advanced options → admin scope + write permission
export ADMIN_API_KEY='sk-...'
curl /api/v1/admin/accounts \
  -H "Authorization: Bearer $ADMIN_API_KEY"

Chat Completions

Generate chat completions using OpenAI-compatible API. Supports streaming.

curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain GPU time-slicing in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Streaming

Stream responses token-by-token for real-time output.

curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "messages": [{"role": "user", "content": "Write a haiku about GPUs."}],
    "stream": true
  }'

GPU Instances

Create and manage GPU instances with browser terminal, JupyterLab, SSH, and exposed-service workflows. Choose interactive (guaranteed availability) or batch (50% discount, preemptible) workloads. Configurable storage (10–500 GB, default 50 GB) with unified quota enforcement. Tiers: dedicated (full isolation), MIG (hardware memory isolation), timesliced (shared — no memory isolation, GPU memory zeroed before each workload).

# List GPU tiers (public, no auth)
curl /api/v1/gpu-tiers

# Create an interactive instance (default 50 GB storage)
curl -X POST /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-training-job",
    "tier": "dedicated",
    "ssh_key": "ssh-ed25519 AAAA...",
    "storage_gb": 100
  }'

# Create a batch instance (50% discount, preemptible)
curl -X POST /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "batch-job",
    "tier": "timesliced",
    "ssh_key": "ssh-ed25519 AAAA...",
    "workload_type": "batch",
    "storage_gb": 50
  }'

# List instances
curl /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Start / Stop / Requeue
curl -X POST /api/v1/instances/inst-abc123/stop \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

curl -X POST /api/v1/instances/inst-abc123/start \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Re-queue a preempted batch instance
curl -X POST /api/v1/instances/inst-abc123/requeue \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete instance
curl -X DELETE /api/v1/instances/inst-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Deploy vLLM on a GPU Instance

SSH into your GPU instance and run vLLM to serve any Hugging Face model with an OpenAI-compatible API.

# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>

# 2. Run the setup script (creates venv, installs vLLM)
./setup-vllm.sh meta-llama/Llama-3.1-8B-Instruct

# Or install manually:
python3 -m venv ~/.venv/vllm && source ~/.venv/vllm/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 --port 8000

# 3. From another terminal, forward port 8000 via SSH
ssh -p <PORT> -L 8000:localhost:8000 gpuuser@<HOST>

# 4. Query the model
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Deploy Ollama on a GPU Instance

SSH into your GPU instance and run Ollama for one-command model downloads and serving.

# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>

# 2. Run the setup script (installs to ~/bin, models to ~/.ollama)
./setup-ollama.sh llama3

# Or install manually:
mkdir -p ~/bin
curl -fsSL https://github.com/ollama/ollama/releases/latest/download/ollama-linux-amd64.tar.zst | tar --zstd -x -C /tmp
cp /tmp/bin/ollama ~/bin/ollama && chmod +x ~/bin/ollama
ollama serve &
ollama pull llama3
ollama run llama3

# 3. From another terminal, forward port 11434 via SSH
ssh -p <PORT> -L 11434:localhost:11434 gpuuser@<HOST>

# 4. Query the model via API
curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Deploy SGLang on a GPU Instance

SSH into your GPU instance and run SGLang for high-throughput LLM serving with RadixAttention.

# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>

# 2. Run the setup script (creates venv, installs SGLang)
./setup-sglang.sh meta-llama/Llama-3.1-8B-Instruct

# Or install manually:
python3 -m venv ~/.venv/sglang
source ~/.venv/sglang/bin/activate
pip install "sglang[all]"
python -m sglang.launch_server \
  --model-path meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 --port 30000

# 3. From another terminal, forward port 30000 via SSH
ssh -p <PORT> -L 30000:localhost:30000 gpuuser@<HOST>

# 4. Query the model (OpenAI-compatible)
curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Token Management

Check balance and manage prepaid token credits.

# Check balance
curl /api/v1/tokens/balance \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Purchase tokens
curl -X POST /api/v1/tokens/purchase \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"amount": 1000000}'

# View usage history
curl /api/v1/tokens/usage \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Embeddings

Generate text embeddings for semantic search, clustering, and classification.

curl /v1/embeddings \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": "GPU computing enables massive parallelism"
  }'

# Batch embeddings — pass an array of strings
curl /v1/embeddings \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": [
      "First document to embed",
      "Second document to embed",
      "Third document to embed"
    ]
  }'

Model Deployments

Deploy and manage LLM models on the platform. Models are served via KubeAI with automatic scaling.

# List your model deployments
curl /api/v1/models \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Deploy a new model
curl -X POST /api/v1/models \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-llama",
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "engine": "vllm",
    "minReplicas": 0,
    "maxReplicas": 2,
    "resourceProfile": "nvidia-gpu-dedicated",
    "huggingfaceToken": "hf_..."
  }'

# Scale a deployment
curl -X PUT /api/v1/models/my-llama/scaling \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"minReplicas": 1, "maxReplicas": 3}'

# Stop a model (scale to zero)
curl -X POST /api/v1/models/my-llama/stop \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

GPU Functions (Serverless)

Run GPU workloads as serverless functions. Submit a container image and entrypoint, invoke on demand — billed per execution.

# Create a GPU function
curl -X POST /api/v1/functions \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "image-classifier",
    "image": "registry.gpu.local/myteam/classifier:v1",
    "entrypoint": "python classify.py",
    "gpuTier": "timesliced",
    "timeoutSeconds": 300,
    "envVars": {"MODEL_PATH": "/models/resnet50.pt"}
  }'

# Run the function
curl -X POST /api/v1/functions/fn-abc12345/run \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "image_url": "https://example.com/photo.jpg",
      "top_k": 5
    }
  }'
# Returns 202 Accepted with run ID

# Check run status
curl /api/v1/functions/fn-abc12345/runs/run-def67890 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# List all functions
curl /api/v1/functions \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# List runs for a function
curl /api/v1/functions/fn-abc12345/runs \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete a function
curl -X DELETE /api/v1/functions/fn-abc12345 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Fine-tuning

Fine-tune LLMs with your own data using an OpenAI-compatible API. Supports LoRA and QLoRA methods. Upload training data as JSONL, create a job, and monitor progress.

# 1. Upload a training file (JSONL format)
curl -X POST /v1/files \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -F "file=@training_data.jsonl" \
  -F "purpose=fine-tune"

# Training file format (each line):
# {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

# 2. Create a fine-tuning job
curl -X POST /v1/fine-tuning/jobs \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "training_file": "file-abc123",
    "method": "lora",
    "hyperparameters": {
      "lora_rank": 16,
      "learning_rate": "2e-4",
      "n_epochs": 3,
      "batch_size": 4
    }
  }'

# 3. Check job status
curl /v1/fine-tuning/jobs/ftjob-xyz789 \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

# 4. Stream training events (loss curve)
curl /v1/fine-tuning/jobs/ftjob-xyz789/events \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

# 5. Use the fine-tuned model
curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ft:llama-3.1-8b:my-lora:ftjob-xyz789",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Cancel a running job
curl -X DELETE /v1/fine-tuning/jobs/ftjob-xyz789 \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

# List all training files
curl /v1/files \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

Virtual Machines

Create and manage full virtual machines with GPU passthrough, networking, and persistent storage via KubeVirt.

# List VM templates (public)
curl /api/v1/vm-templates

# Create a VM
curl -X POST /api/v1/vms \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ml-workstation",
    "templateId": "tpl-ubuntu-gpu",
    "cpu": 8,
    "memoryGb": 32,
    "diskGb": 100,
    "sshKey": "ssh-ed25519 AAAA..."
  }'

# List your VMs
curl /api/v1/vms \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Stop / Start / Reboot
curl -X POST /api/v1/vms/vm-abc123/stop \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

curl -X POST /api/v1/vms/vm-abc123/start \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

curl -X POST /api/v1/vms/vm-abc123/reboot \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Manage VM networks
curl -X POST /api/v1/vms/vm-abc123/networks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "training-net", "cidr": "10.100.0.0/24"}'

# Delete a VM
curl -X DELETE /api/v1/vms/vm-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Kubernetes Clusters

Provision isolated virtual Kubernetes clusters (vCluster) with GPU access. Each cluster gets its own control plane, kubeconfig, and resource isolation.

# Create a cluster
curl -X POST /api/v1/clusters \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-cluster",
    "version": "1.31"
  }'

# List clusters
curl /api/v1/clusters \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Download kubeconfig
curl /api/v1/clusters/cls-abc123/kubeconfig \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -o kubeconfig.yaml

# Use the cluster
export KUBECONFIG=kubeconfig.yaml
kubectl get nodes
kubectl run gpu-test --image=nvidia/cuda:12.8.0-base-ubuntu24.04 \
  --limits=nvidia.com/gpu=1 -- nvidia-smi

# Delete the cluster
curl -X DELETE /api/v1/clusters/cls-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Webhooks

Receive real-time HTTP callbacks when events occur. Deliveries are signed with HMAC-SHA256 and retried with exponential backoff.

# Create a webhook endpoint
curl -X POST /api/v1/webhooks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.example.com/webhook",
    "events": [
      "instance.created",
      "instance.terminated",
      "balance.low",
      "function.completed",
      "fine_tuning.completed"
    ]
  }'

# List webhook endpoints
curl /api/v1/webhooks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Test a webhook (sends a test delivery)
curl -X POST /api/v1/webhooks/wh-abc123/test \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# View delivery history
curl /api/v1/webhooks/wh-abc123/deliveries \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete a webhook
curl -X DELETE /api/v1/webhooks/wh-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Available events:
# instance.created, instance.started, instance.stopped,
# instance.terminated, instance.error
# vm.created, vm.started, vm.stopped, vm.terminated, vm.error
# balance.low, balance.depleted
# key.created, key.deleted, key.expired
# function.completed, function.failed
# fine_tuning.started, fine_tuning.completed, fine_tuning.failed

Spending Alerts

Set up balance threshold notifications to avoid unexpected depletion. Alerts fire via webhook or in-app notification.

# Create a spending alert
curl -X POST /api/v1/spending-alerts \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Low balance warning",
    "threshold": 100000,
    "notifyVia": "webhook"
  }'

# List alerts
curl /api/v1/spending-alerts \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Update an alert threshold
curl -X PATCH /api/v1/spending-alerts/alert-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"threshold": 50000}'

# Delete an alert
curl -X DELETE /api/v1/spending-alerts/alert-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

SSH Keys

Manage SSH public keys for GPU instance and VM access. Keys can be organised into groups for team-based access control.

# List your SSH keys
curl /api/v1/ssh-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Upload a new SSH key
curl -X POST /api/v1/ssh-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "work-laptop",
    "public_key": "ssh-ed25519 AAAA... user@host"
  }'

# Generate an SSH key pair (private key returned once)
curl -X POST /api/v1/ssh-keys/generate \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "generated-key"}'

# Delete a key
curl -X DELETE /api/v1/ssh-keys/key-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

SSH Key Groups

Organise SSH keys into named groups for team-based access. Assign a group when creating instances to grant access to all keys in that group.

# Create a key group
curl -X POST /api/v1/ssh-key-groups \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "ml-team", "description": "ML engineering team keys"}'

# Add a key to a group
curl -X POST /api/v1/ssh-key-groups/grp-abc123/keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ssh_key_id": "key-xyz789"}'

# List groups
curl /api/v1/ssh-key-groups \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Remove a key from a group
curl -X DELETE /api/v1/ssh-key-groups/grp-abc123/keys/key-xyz789 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

API Keys

Create and manage API keys for programmatic access. Keys support scoped permissions (read/write) and resource-level access control.

# Create an API key with scoped permissions
curl -X POST /api/v1/api-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "inference-key",
    "scopes": ["instances", "models"],
    "permissions": ["read", "write"],
    "expires_in_days": 90
  }'
# Response includes the full key (shown once):
# {"id": "ak-...", "key": "sk-ic-...", "prefix": "sk-ic-abc1"}

# List your API keys
curl /api/v1/api-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Revoke a key
curl -X DELETE /api/v1/api-keys/ak-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Resource Tags

AWS-compatible key/value tagging for any resource. Use tags to organise, filter, and track resources across your infrastructure.

# Tag a resource
curl -X POST /api/v1/tags \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_type": "instance",
    "resource_id": "inst-abc123",
    "tags": [
      {"key": "environment", "value": "production"},
      {"key": "team", "value": "ml-research"}
    ]
  }'

# List tags for a resource
curl "/api/v1/tags?resource_type=instance&resource_id=inst-abc123" \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Filter resources by tag
curl "/api/v1/instances?tag:environment=production" \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete tags
curl -X DELETE /api/v1/tags \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_type": "instance",
    "resource_id": "inst-abc123",
    "tag_keys": ["environment"]
  }'

Batch Inference Jobs

Submit large-scale inference jobs using the OpenAI Batch API. Upload a JSONL file of requests and retrieve results asynchronously.

# Submit a batch job
curl -X POST /api/v1/batch-jobs \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "requests": [
      {"custom_id": "req-1", "messages": [{"role": "user", "content": "Summarise: ..."}]},
      {"custom_id": "req-2", "messages": [{"role": "user", "content": "Translate: ..."}]}
    ]
  }'

# Check job status
curl /api/v1/batch-jobs/batch-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Get results when complete
curl /api/v1/batch-jobs/batch-abc123/results \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# List all batch jobs
curl /api/v1/batch-jobs \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Cancel a running job
curl -X DELETE /api/v1/batch-jobs/batch-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Teams

Organise accounts into teams with role-based access. Roles: owner, admin, member, viewer. Team owners can manage membership and permissions.

# Create a team
curl -X POST /api/v1/teams \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "ML Research", "description": "Machine learning research team"}'

# List all teams in your tenant
curl /api/v1/teams \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Get teams you belong to
curl /api/v1/teams/my \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Add a member
curl -X POST /api/v1/teams/team-abc123/members \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"account_id": "acc-xyz789", "role": "member"}'

# Update a member's role
curl -X PATCH /api/v1/teams/team-abc123/members/tmem-def456 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"role": "admin"}'

# Remove a member
curl -X DELETE /api/v1/teams/team-abc123/members/tmem-def456 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Slurm Environments

Create managed Slurm HPC environments for distributed GPU training. Each environment gets its own Slurm control plane with SSH access.

# Create a Slurm environment
curl -X POST /api/v1/slurm-environments \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "distributed-training",
    "max_nodes": 4,
    "gpu_per_node": 1
  }'

# List environments
curl /api/v1/slurm-environments \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# SSH into the Slurm head node
# ssh -p <ssh_port> gpuuser@<ssh_host>
# Then submit jobs with: sbatch, srun, squeue

# Delete environment
curl -X DELETE /api/v1/slurm-environments/slurm-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

VM Networks & Disks

Manage private networks, subnets, firewall rules, and persistent block storage for virtual machines.

# Create a private network
curl -X POST /api/v1/vm-networks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "training-net", "description": "Isolated training network"}'

# Add a subnet
curl -X POST /api/v1/vm-networks/net-abc123/subnets \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "workers", "cidr": "10.100.0.0/24"}'

# Add a firewall rule
curl -X POST /api/v1/vm-networks/net-abc123/firewall-rules \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "direction": "ingress",
    "protocol": "tcp",
    "port_range": "22",
    "source": "0.0.0.0/0",
    "action": "allow"
  }'

# Create a persistent disk
curl -X POST /api/v1/vm-disks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "data-volume", "size_gb": 500}'

# Attach disk to a VM
curl -X POST /api/v1/vm-disks/disk-abc123/attach \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"vm_id": "vm-xyz789"}'

# Resize a disk
curl -X POST /api/v1/vm-disks/disk-abc123/resize \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"size_gb": 1000}'

Pagination & Idempotency

All list endpoints support cursor-based pagination. Write operations support idempotency keys to safely retry requests.

# Paginated listing
curl "/api/v1/instances?maxResults=10" \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Response: {"instances": [...], "nextToken": "eyJ..."}

# Fetch next page
curl "/api/v1/instances?maxResults=10&nextToken=eyJ..." \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Idempotent instance creation (safe to retry)
curl -X POST /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: my-unique-request-id-123" \
  -d '{
    "name": "my-workspace",
    "tier": "dedicated",
    "ssh_key": "ssh-ed25519 AAAA..."
  }'
# Replaying the same Idempotency-Key returns the original response

# DryRun — validate without creating
curl -X POST "/api/v1/instances?dryRun=true" \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "test", "tier": "dedicated"}'

Python SDK

Official Python SDK with sync and async clients, typed models, pagination, and automatic retries.

# Install the SDK
pip install ic-gpu

# Or from source
git clone https://github.com/your-org/ic-gpu-service.git
cd ic-gpu-service/sdk-python
pip install -e .

TypeScript SDK

Official TypeScript SDK using native fetch. Works in Node.js 18+, Bun, Deno, and browsers.

# Install the SDK
npm install @ic-gpu/client

# Or from source
git clone https://github.com/your-org/ic-gpu-service.git
cd ic-gpu-service/sdk-typescript
npm install && npm run build

Terraform Provider

Manage IC GPU resources as infrastructure-as-code. Supports instances, VMs, clusters, model deployments, API keys, SSH keys, webhooks, and spending alerts.

# Install the provider
# Add to your Terraform configuration:
terraform {
  required_providers {
    icgpu = {
      source = "registry.terraform.io/ic-gpu/icgpu"
    }
  }
}

# Initialise and plan
terraform init
terraform plan
terraform apply

JupyterHub Notebooks

Interactive GPU-accelerated Jupyter notebooks with pre-installed ML frameworks. Authenticate via Keycloak SSO. Choose from CPU, time-sliced GPU, or dedicated GPU profiles.

# JupyterHub is available at:
# https://notebooks.gpu.local

# Login uses Keycloak SSO — same credentials as the dashboard.

# Available server profiles:
# - CPU Only:        2 CPU / 4 GB RAM (lightweight notebooks)
# - GPU Time-sliced: 4 CPU / 16 GB RAM + shared GPU
# - GPU Dedicated:   8 CPU / 24 GB RAM + full GPU

# Pre-installed in the workspace image:
# - Python 3, PyTorch, CUDA 12.8
# - JupyterLab with extensions
# - vLLM, Ollama, SGLang setup scripts
# - IC GPU Python SDK

# Access the platform API from notebooks:
curl http://platform-api.platform.svc.cluster.local:80/v1/models

# Access KubeAI for inference:
curl http://kubeai.kubeai.svc.cluster.local:80/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3-8b", "messages": [{"role": "user", "content": "Hi"}]}'

Response Headers

LLM API responses include token balance information in headers.

Header	Description
`X-Tokens-Remaining`	Remaining token balance after this request
`X-Tokens-Used`	Tokens consumed by this request
`X-Request-Id`	Unique request identifier for debugging
`X-Webhook-Signature`	HMAC-SHA256 signature on webhook deliveries
`Idempotency-Key`	Client request token for idempotent operations (echoed back)

Error Codes

All errors follow a consistent format with AWS-compatible error codes.

Code	Type	Description
`400`	`ValidationException`	Invalid request parameters or body
`401`	`AuthenticationError`	Invalid or missing API key / OIDC token
`402`	`InsufficientTokens`	Token balance depleted — purchase more tokens
`403`	`AccessDeniedException`	Insufficient permissions for this operation
`404`	`ResourceNotFoundException`	Resource not found
`409`	`ConflictException`	Resource already exists or state conflict
`429`	`ThrottlingException`	Rate limit exceeded — retry after cooldown
`500`	`InternalServiceException`	Unexpected server error
`502`	`ServiceUnavailableException`	Upstream service unavailable — model scaling up

Webhook Events

Subscribe to any combination of events. Use * to receive all events.

Event	Description
`instance.created`	GPU instance provisioned
`instance.started`	GPU instance started (pod running)
`instance.stopped`	GPU instance stopped
`instance.terminated`	GPU instance deleted
`instance.error`	GPU instance entered error state
`instance.preempted`	Batch GPU instance preempted by higher-priority workload
`vm.created`	Virtual machine provisioned
`vm.started`	Virtual machine started
`vm.stopped`	Virtual machine stopped
`vm.terminated`	Virtual machine deleted
`vm.error`	Virtual machine entered error state
`balance.low`	Token balance below spending alert threshold
`balance.depleted`	Token balance reached zero
`key.created`	API key created
`key.deleted`	API key deleted
`key.expired`	API key expired
`function.completed`	GPU function run completed successfully
`function.failed`	GPU function run failed
`fine_tuning.started`	Fine-tuning job started training
`fine_tuning.completed`	Fine-tuning job completed — model ready
`fine_tuning.failed`	Fine-tuning job failed

Terraform Resources

All resources support import via terraform import. Data sources are read-only.

Resource / Data Source	Description
`icgpu_instance`	GPU instance lifecycle (create, start, stop, delete)
`icgpu_vm`	Virtual machine lifecycle with GPU passthrough
`icgpu_cluster`	Virtual Kubernetes cluster (vCluster)
`icgpu_model_deployment`	LLM model deployment with auto-scaling
`icgpu_api_key`	API key for LLM and resource access
`icgpu_ssh_key`	SSH key for instance and VM access
`icgpu_webhook`	Webhook endpoint for event notifications
`icgpu_spending_alert`	Balance threshold alert
`data.icgpu_gpu_tiers`	Available GPU tiers and pricing (read-only)
`data.icgpu_vm_templates`	Available VM templates (read-only)
`data.icgpu_token_packages`	Available token packages (read-only)
`data.icgpu_model_catalogue`	Pre-tested model catalogue (read-only)

Rate Limits

Rate limits are enforced per account using a sliding window. Exceeding the limit returns 429 ThrottlingException.

Endpoint Group	Limit	Window
Dashboard API (/api/v1/*)	`120 requests`	1 minute
LLM API (/v1/*)	`60 requests`	1 minute
Traefik ingress (per IP)	`30 requests avg`	1 minute