API Documentation
OpenAI-compatible API, SDKs, Terraform provider, and serverless GPU functions. Drop-in replacement for your existing code.
Authentication
All API requests require authentication via API key (LLM API) or OIDC token (Dashboard API).
# LLM API — use your API key curl /v1/chat/completions \ -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \ -H "Content-Type: application/json" # Dashboard API — use your OIDC token curl /api/v1/instances \ -H "Authorization: Bearer YOUR_OIDC_TOKEN"
Chat Completions
Generate chat completions using OpenAI-compatible API. Supports streaming.
curl /v1/chat/completions \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain GPU time-slicing in one paragraph."}
],
"temperature": 0.7,
"max_tokens": 256
}'Streaming
Stream responses token-by-token for real-time output.
curl /v1/chat/completions \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b",
"messages": [{"role": "user", "content": "Write a haiku about GPUs."}],
"stream": true
}'GPU Instances
Create and manage GPU instances with SSH access. Choose interactive (guaranteed availability) or batch (50% discount, preemptible) workloads. Configurable storage (10–500 GB, default 50 GB) with unified quota enforcement. Tiers: dedicated (full isolation), MIG (hardware memory isolation), timesliced (shared — no memory isolation, GPU memory zeroed before each workload).
# List GPU tiers (public, no auth)
curl /api/v1/gpu-tiers
# Create an interactive instance (default 50 GB storage)
curl -X POST /api/v1/instances \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-training-job",
"tier": "dedicated",
"ssh_key": "ssh-ed25519 AAAA...",
"storage_gb": 100
}'
# Create a batch instance (50% discount, preemptible)
curl -X POST /api/v1/instances \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "batch-job",
"tier": "timesliced",
"ssh_key": "ssh-ed25519 AAAA...",
"workload_type": "batch",
"storage_gb": 50
}'
# List instances
curl /api/v1/instances \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Start / Stop / Requeue
curl -X POST /api/v1/instances/inst-abc123/stop \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
curl -X POST /api/v1/instances/inst-abc123/start \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Re-queue a preempted batch instance
curl -X POST /api/v1/instances/inst-abc123/requeue \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Delete instance
curl -X DELETE /api/v1/instances/inst-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Deploy vLLM on a GPU Instance
SSH into your GPU instance and run vLLM to serve any Hugging Face model with an OpenAI-compatible API.
# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>
# 2. Run the setup script (creates venv, installs vLLM)
./setup-vllm.sh meta-llama/Llama-3.1-8B-Instruct
# Or install manually:
python3 -m venv ~/.venv/vllm && source ~/.venv/vllm/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3.1-8B-Instruct \
--host 0.0.0.0 --port 8000
# 3. From another terminal, forward port 8000 via SSH
ssh -p <PORT> -L 8000:localhost:8000 gpuuser@<HOST>
# 4. Query the model
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'Deploy Ollama on a GPU Instance
SSH into your GPU instance and run Ollama for one-command model downloads and serving.
# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>
# 2. Run the setup script (installs to ~/bin, models to ~/.ollama)
./setup-ollama.sh llama3
# Or install manually:
mkdir -p ~/bin
curl -fsSL https://github.com/ollama/ollama/releases/latest/download/ollama-linux-amd64.tar.zst | tar --zstd -x -C /tmp
cp /tmp/bin/ollama ~/bin/ollama && chmod +x ~/bin/ollama
ollama serve &
ollama pull llama3
ollama run llama3
# 3. From another terminal, forward port 11434 via SSH
ssh -p <PORT> -L 11434:localhost:11434 gpuuser@<HOST>
# 4. Query the model via API
curl http://localhost:11434/api/chat \
-d '{
"model": "llama3",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'Deploy SGLang on a GPU Instance
SSH into your GPU instance and run SGLang for high-throughput LLM serving with RadixAttention.
# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>
# 2. Run the setup script (creates venv, installs SGLang)
./setup-sglang.sh meta-llama/Llama-3.1-8B-Instruct
# Or install manually:
python3 -m venv ~/.venv/sglang
source ~/.venv/sglang/bin/activate
pip install "sglang[all]"
python -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--host 0.0.0.0 --port 30000
# 3. From another terminal, forward port 30000 via SSH
ssh -p <PORT> -L 30000:localhost:30000 gpuuser@<HOST>
# 4. Query the model (OpenAI-compatible)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'Token Management
Check balance and manage prepaid token credits.
# Check balance
curl /api/v1/tokens/balance \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Purchase tokens
curl -X POST /api/v1/tokens/purchase \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"amount": 1000000}'
# View usage history
curl /api/v1/tokens/usage \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Embeddings
Generate text embeddings for semantic search, clustering, and classification.
curl /v1/embeddings \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": "GPU computing enables massive parallelism"
}'
# Batch embeddings — pass an array of strings
curl /v1/embeddings \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": [
"First document to embed",
"Second document to embed",
"Third document to embed"
]
}'Model Deployments
Deploy and manage LLM models on the platform. Models are served via KubeAI with automatic scaling.
# List your model deployments
curl /api/v1/models \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Deploy a new model
curl -X POST /api/v1/models \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-llama",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"engine": "vllm",
"minReplicas": 0,
"maxReplicas": 2,
"resourceProfile": "nvidia-gpu-dedicated",
"huggingfaceToken": "hf_..."
}'
# Scale a deployment
curl -X PUT /api/v1/models/my-llama/scaling \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"minReplicas": 1, "maxReplicas": 3}'
# Stop a model (scale to zero)
curl -X POST /api/v1/models/my-llama/stop \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"GPU Functions (Serverless)
Run GPU workloads as serverless functions. Submit a container image and entrypoint, invoke on demand — billed per execution.
# Create a GPU function
curl -X POST /api/v1/functions \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "image-classifier",
"image": "registry.gpu.local/myteam/classifier:v1",
"entrypoint": "python classify.py",
"gpuTier": "timesliced",
"timeoutSeconds": 300,
"envVars": {"MODEL_PATH": "/models/resnet50.pt"}
}'
# Run the function
curl -X POST /api/v1/functions/fn-abc12345/run \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input": {
"image_url": "https://example.com/photo.jpg",
"top_k": 5
}
}'
# Returns 202 Accepted with run ID
# Check run status
curl /api/v1/functions/fn-abc12345/runs/run-def67890 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# List all functions
curl /api/v1/functions \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# List runs for a function
curl /api/v1/functions/fn-abc12345/runs \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Delete a function
curl -X DELETE /api/v1/functions/fn-abc12345 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Fine-tuning
Fine-tune LLMs with your own data using an OpenAI-compatible API. Supports LoRA and QLoRA methods. Upload training data as JSONL, create a job, and monitor progress.
# 1. Upload a training file (JSONL format)
curl -X POST /v1/files \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-F "file=@training_data.jsonl" \
-F "purpose=fine-tune"
# Training file format (each line):
# {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
# 2. Create a fine-tuning job
curl -X POST /v1/fine-tuning/jobs \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"training_file": "file-abc123",
"method": "lora",
"hyperparameters": {
"lora_rank": 16,
"learning_rate": "2e-4",
"n_epochs": 3,
"batch_size": 4
}
}'
# 3. Check job status
curl /v1/fine-tuning/jobs/ftjob-xyz789 \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY"
# 4. Stream training events (loss curve)
curl /v1/fine-tuning/jobs/ftjob-xyz789/events \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY"
# 5. Use the fine-tuned model
curl /v1/chat/completions \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "ft:llama-3.1-8b:my-lora:ftjob-xyz789",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Cancel a running job
curl -X DELETE /v1/fine-tuning/jobs/ftjob-xyz789 \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY"
# List all training files
curl /v1/files \
-H "Authorization: Bearer sk-ic-YOUR_API_KEY"Virtual Machines
Create and manage full virtual machines with GPU passthrough, networking, and persistent storage via KubeVirt.
# List VM templates (public)
curl /api/v1/vm-templates
# Create a VM
curl -X POST /api/v1/vms \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "ml-workstation",
"templateId": "tpl-ubuntu-gpu",
"cpu": 8,
"memoryGb": 32,
"diskGb": 100,
"sshKey": "ssh-ed25519 AAAA..."
}'
# List your VMs
curl /api/v1/vms \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Stop / Start / Reboot
curl -X POST /api/v1/vms/vm-abc123/stop \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
curl -X POST /api/v1/vms/vm-abc123/start \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
curl -X POST /api/v1/vms/vm-abc123/reboot \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Manage VM networks
curl -X POST /api/v1/vms/vm-abc123/networks \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "training-net", "cidr": "10.100.0.0/24"}'
# Delete a VM
curl -X DELETE /api/v1/vms/vm-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Kubernetes Clusters
Provision isolated virtual Kubernetes clusters (vCluster) with GPU access. Each cluster gets its own control plane, kubeconfig, and resource isolation.
# Create a cluster
curl -X POST /api/v1/clusters \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "training-cluster",
"version": "1.31"
}'
# List clusters
curl /api/v1/clusters \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Download kubeconfig
curl /api/v1/clusters/cls-abc123/kubeconfig \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-o kubeconfig.yaml
# Use the cluster
export KUBECONFIG=kubeconfig.yaml
kubectl get nodes
kubectl run gpu-test --image=nvidia/cuda:12.8.0-base-ubuntu24.04 \
--limits=nvidia.com/gpu=1 -- nvidia-smi
# Delete the cluster
curl -X DELETE /api/v1/clusters/cls-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Webhooks
Receive real-time HTTP callbacks when events occur. Deliveries are signed with HMAC-SHA256 and retried with exponential backoff.
# Create a webhook endpoint
curl -X POST /api/v1/webhooks \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-app.example.com/webhook",
"events": [
"instance.created",
"instance.terminated",
"balance.low",
"function.completed",
"fine_tuning.completed"
]
}'
# List webhook endpoints
curl /api/v1/webhooks \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Test a webhook (sends a test delivery)
curl -X POST /api/v1/webhooks/wh-abc123/test \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# View delivery history
curl /api/v1/webhooks/wh-abc123/deliveries \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Delete a webhook
curl -X DELETE /api/v1/webhooks/wh-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Available events:
# instance.created, instance.started, instance.stopped,
# instance.terminated, instance.error
# vm.created, vm.started, vm.stopped, vm.terminated, vm.error
# balance.low, balance.depleted
# key.created, key.deleted, key.expired
# function.completed, function.failed
# fine_tuning.started, fine_tuning.completed, fine_tuning.failedSpending Alerts
Set up balance threshold notifications to avoid unexpected depletion. Alerts fire via webhook or in-app notification.
# Create a spending alert
curl -X POST /api/v1/spending-alerts \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Low balance warning",
"threshold": 100000,
"notifyVia": "webhook"
}'
# List alerts
curl /api/v1/spending-alerts \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Update an alert threshold
curl -X PATCH /api/v1/spending-alerts/alert-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"threshold": 50000}'
# Delete an alert
curl -X DELETE /api/v1/spending-alerts/alert-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"SSH Keys
Manage SSH public keys for GPU instance and VM access. Keys can be organised into groups for team-based access control.
# List your SSH keys
curl /api/v1/ssh-keys \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Upload a new SSH key
curl -X POST /api/v1/ssh-keys \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "work-laptop",
"public_key": "ssh-ed25519 AAAA... user@host"
}'
# Generate an SSH key pair (private key returned once)
curl -X POST /api/v1/ssh-keys/generate \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "generated-key"}'
# Delete a key
curl -X DELETE /api/v1/ssh-keys/key-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"SSH Key Groups
Organise SSH keys into named groups for team-based access. Assign a group when creating instances to grant access to all keys in that group.
# Create a key group
curl -X POST /api/v1/ssh-key-groups \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "ml-team", "description": "ML engineering team keys"}'
# Add a key to a group
curl -X POST /api/v1/ssh-key-groups/grp-abc123/keys \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"ssh_key_id": "key-xyz789"}'
# List groups
curl /api/v1/ssh-key-groups \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Remove a key from a group
curl -X DELETE /api/v1/ssh-key-groups/grp-abc123/keys/key-xyz789 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"API Keys
Create and manage API keys for programmatic access. Keys support scoped permissions (read/write) and resource-level access control.
# Create an API key with scoped permissions
curl -X POST /api/v1/api-keys \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "inference-key",
"scopes": ["instances", "models"],
"permissions": ["read", "write"],
"expires_in_days": 90
}'
# Response includes the full key (shown once):
# {"id": "ak-...", "key": "sk-ic-...", "prefix": "sk-ic-abc1"}
# List your API keys
curl /api/v1/api-keys \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Revoke a key
curl -X DELETE /api/v1/api-keys/ak-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Batch Inference Jobs
Submit large-scale inference jobs using the OpenAI Batch API. Upload a JSONL file of requests and retrieve results asynchronously.
# Submit a batch job
curl -X POST /api/v1/batch-jobs \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b",
"requests": [
{"custom_id": "req-1", "messages": [{"role": "user", "content": "Summarise: ..."}]},
{"custom_id": "req-2", "messages": [{"role": "user", "content": "Translate: ..."}]}
]
}'
# Check job status
curl /api/v1/batch-jobs/batch-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Get results when complete
curl /api/v1/batch-jobs/batch-abc123/results \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# List all batch jobs
curl /api/v1/batch-jobs \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Cancel a running job
curl -X DELETE /api/v1/batch-jobs/batch-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Teams
Organise accounts into teams with role-based access. Roles: owner, admin, member, viewer. Team owners can manage membership and permissions.
# Create a team
curl -X POST /api/v1/teams \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "ML Research", "description": "Machine learning research team"}'
# List all teams in your tenant
curl /api/v1/teams \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Get teams you belong to
curl /api/v1/teams/my \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Add a member
curl -X POST /api/v1/teams/team-abc123/members \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"account_id": "acc-xyz789", "role": "member"}'
# Update a member's role
curl -X PATCH /api/v1/teams/team-abc123/members/tmem-def456 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"role": "admin"}'
# Remove a member
curl -X DELETE /api/v1/teams/team-abc123/members/tmem-def456 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"Slurm Environments
Create managed Slurm HPC environments for distributed GPU training. Each environment gets its own Slurm control plane with SSH access.
# Create a Slurm environment
curl -X POST /api/v1/slurm-environments \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "distributed-training",
"max_nodes": 4,
"gpu_per_node": 1
}'
# List environments
curl /api/v1/slurm-environments \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# SSH into the Slurm head node
# ssh -p <ssh_port> gpuuser@<ssh_host>
# Then submit jobs with: sbatch, srun, squeue
# Delete environment
curl -X DELETE /api/v1/slurm-environments/slurm-abc123 \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"VM Networks & Disks
Manage private networks, subnets, firewall rules, and persistent block storage for virtual machines.
# Create a private network
curl -X POST /api/v1/vm-networks \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "training-net", "description": "Isolated training network"}'
# Add a subnet
curl -X POST /api/v1/vm-networks/net-abc123/subnets \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "workers", "cidr": "10.100.0.0/24"}'
# Add a firewall rule
curl -X POST /api/v1/vm-networks/net-abc123/firewall-rules \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"direction": "ingress",
"protocol": "tcp",
"port_range": "22",
"source": "0.0.0.0/0",
"action": "allow"
}'
# Create a persistent disk
curl -X POST /api/v1/vm-disks \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "data-volume", "size_gb": 500}'
# Attach disk to a VM
curl -X POST /api/v1/vm-disks/disk-abc123/attach \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"vm_id": "vm-xyz789"}'
# Resize a disk
curl -X POST /api/v1/vm-disks/disk-abc123/resize \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"size_gb": 1000}'Pagination & Idempotency
All list endpoints support cursor-based pagination. Write operations support idempotency keys to safely retry requests.
# Paginated listing
curl "/api/v1/instances?maxResults=10" \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Response: {"instances": [...], "nextToken": "eyJ..."}
# Fetch next page
curl "/api/v1/instances?maxResults=10&nextToken=eyJ..." \
-H "Authorization: Bearer YOUR_OIDC_TOKEN"
# Idempotent instance creation (safe to retry)
curl -X POST /api/v1/instances \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: my-unique-request-id-123" \
-d '{
"name": "my-workspace",
"tier": "dedicated",
"ssh_key": "ssh-ed25519 AAAA..."
}'
# Replaying the same Idempotency-Key returns the original response
# DryRun — validate without creating
curl -X POST "/api/v1/instances?dryRun=true" \
-H "Authorization: Bearer YOUR_OIDC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "test", "tier": "dedicated"}'Python SDK
Official Python SDK with sync and async clients, typed models, pagination, and automatic retries.
# Install the SDK pip install ic-gpu # Or from source git clone https://github.com/your-org/ic-gpu-service.git cd ic-gpu-service/sdk-python pip install -e .
TypeScript SDK
Official TypeScript SDK using native fetch. Works in Node.js 18+, Bun, Deno, and browsers.
# Install the SDK npm install @ic-gpu/client # Or from source git clone https://github.com/your-org/ic-gpu-service.git cd ic-gpu-service/sdk-typescript npm install && npm run build
Terraform Provider
Manage IC GPU resources as infrastructure-as-code. Supports instances, VMs, clusters, model deployments, API keys, SSH keys, webhooks, and spending alerts.
# Install the provider
# Add to your Terraform configuration:
terraform {
required_providers {
icgpu = {
source = "registry.terraform.io/ic-gpu/icgpu"
}
}
}
# Initialise and plan
terraform init
terraform plan
terraform applyJupyterHub Notebooks
Interactive GPU-accelerated Jupyter notebooks with pre-installed ML frameworks. Authenticate via Keycloak SSO. Choose from CPU, time-sliced GPU, or dedicated GPU profiles.
# JupyterHub is available at:
# https://notebooks.gpu.local
# Login uses Keycloak SSO — same credentials as the dashboard.
# Available server profiles:
# - CPU Only: 2 CPU / 4 GB RAM (lightweight notebooks)
# - GPU Time-sliced: 4 CPU / 16 GB RAM + shared GPU
# - GPU Dedicated: 8 CPU / 24 GB RAM + full GPU
# Pre-installed in the workspace image:
# - Python 3, PyTorch, CUDA 12.8
# - JupyterLab with extensions
# - vLLM, Ollama, SGLang setup scripts
# - IC GPU Python SDK
# Access the platform API from notebooks:
curl http://platform-api.platform.svc.cluster.local:80/v1/models
# Access KubeAI for inference:
curl http://kubeai.kubeai.svc.cluster.local:80/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama-3-8b", "messages": [{"role": "user", "content": "Hi"}]}'Response Headers
LLM API responses include token balance information in headers.
| Header | Description |
|---|---|
X-Tokens-Remaining | Remaining token balance after this request |
X-Tokens-Used | Tokens consumed by this request |
X-Request-Id | Unique request identifier for debugging |
X-Webhook-Signature | HMAC-SHA256 signature on webhook deliveries |
Idempotency-Key | Client request token for idempotent operations (echoed back) |
Error Codes
All errors follow a consistent format with AWS-compatible error codes.
| Code | Type | Description |
|---|---|---|
400 | ValidationException | Invalid request parameters or body |
401 | AuthenticationError | Invalid or missing API key / OIDC token |
402 | InsufficientTokens | Token balance depleted — purchase more tokens |
403 | AccessDeniedException | Insufficient permissions for this operation |
404 | ResourceNotFoundException | Resource not found |
409 | ConflictException | Resource already exists or state conflict |
429 | ThrottlingException | Rate limit exceeded — retry after cooldown |
500 | InternalServiceException | Unexpected server error |
502 | ServiceUnavailableException | Upstream service unavailable — model scaling up |
Webhook Events
Subscribe to any combination of events. Use * to receive all events.
| Event | Description |
|---|---|
instance.created | GPU instance provisioned |
instance.started | GPU instance started (pod running) |
instance.stopped | GPU instance stopped |
instance.terminated | GPU instance deleted |
instance.error | GPU instance entered error state |
instance.preempted | Batch GPU instance preempted by higher-priority workload |
vm.created | Virtual machine provisioned |
vm.started | Virtual machine started |
vm.stopped | Virtual machine stopped |
vm.terminated | Virtual machine deleted |
vm.error | Virtual machine entered error state |
balance.low | Token balance below spending alert threshold |
balance.depleted | Token balance reached zero |
key.created | API key created |
key.deleted | API key deleted |
key.expired | API key expired |
function.completed | GPU function run completed successfully |
function.failed | GPU function run failed |
fine_tuning.started | Fine-tuning job started training |
fine_tuning.completed | Fine-tuning job completed — model ready |
fine_tuning.failed | Fine-tuning job failed |
Terraform Resources
All resources support import via terraform import. Data sources are read-only.
| Resource / Data Source | Description |
|---|---|
icgpu_instance | GPU instance lifecycle (create, start, stop, delete) |
icgpu_vm | Virtual machine lifecycle with GPU passthrough |
icgpu_cluster | Virtual Kubernetes cluster (vCluster) |
icgpu_model_deployment | LLM model deployment with auto-scaling |
icgpu_api_key | API key for LLM and resource access |
icgpu_ssh_key | SSH key for instance and VM access |
icgpu_webhook | Webhook endpoint for event notifications |
icgpu_spending_alert | Balance threshold alert |
data.icgpu_gpu_tiers | Available GPU tiers and pricing (read-only) |
data.icgpu_vm_templates | Available VM templates (read-only) |
data.icgpu_token_packages | Available token packages (read-only) |
data.icgpu_model_catalogue | Pre-tested model catalogue (read-only) |
Rate Limits
Rate limits are enforced per account using a sliding window. Exceeding the limit returns 429 ThrottlingException.
| Endpoint Group | Limit | Window |
|---|---|---|
| Dashboard API (/api/v1/*) | 120 requests | 1 minute |
| LLM API (/v1/*) | 60 requests | 1 minute |
| Traefik ingress (per IP) | 30 requests avg | 1 minute |