IC GPU

API Documentation

OpenAI-compatible API, SDKs, Terraform provider, and serverless GPU functions. Drop-in replacement for your existing code.

Authentication

All API requests require authentication via API key (LLM API) or OIDC token (Dashboard API).

# LLM API — use your API key
curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json"

# Dashboard API — use your OIDC token
curl /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Chat Completions

Generate chat completions using OpenAI-compatible API. Supports streaming.

curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain GPU time-slicing in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Streaming

Stream responses token-by-token for real-time output.

curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "messages": [{"role": "user", "content": "Write a haiku about GPUs."}],
    "stream": true
  }'

GPU Instances

Create and manage GPU instances with SSH access. Choose interactive (guaranteed availability) or batch (50% discount, preemptible) workloads. Configurable storage (10–500 GB, default 50 GB) with unified quota enforcement. Tiers: dedicated (full isolation), MIG (hardware memory isolation), timesliced (shared — no memory isolation, GPU memory zeroed before each workload).

# List GPU tiers (public, no auth)
curl /api/v1/gpu-tiers

# Create an interactive instance (default 50 GB storage)
curl -X POST /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-training-job",
    "tier": "dedicated",
    "ssh_key": "ssh-ed25519 AAAA...",
    "storage_gb": 100
  }'

# Create a batch instance (50% discount, preemptible)
curl -X POST /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "batch-job",
    "tier": "timesliced",
    "ssh_key": "ssh-ed25519 AAAA...",
    "workload_type": "batch",
    "storage_gb": 50
  }'

# List instances
curl /api/v1/instances \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Start / Stop / Requeue
curl -X POST /api/v1/instances/inst-abc123/stop \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

curl -X POST /api/v1/instances/inst-abc123/start \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Re-queue a preempted batch instance
curl -X POST /api/v1/instances/inst-abc123/requeue \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete instance
curl -X DELETE /api/v1/instances/inst-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Deploy vLLM on a GPU Instance

SSH into your GPU instance and run vLLM to serve any Hugging Face model with an OpenAI-compatible API.

# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>

# 2. Run the setup script (creates venv, installs vLLM)
./setup-vllm.sh meta-llama/Llama-3.1-8B-Instruct

# Or install manually:
python3 -m venv ~/.venv/vllm && source ~/.venv/vllm/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 --port 8000

# 3. From another terminal, forward port 8000 via SSH
ssh -p <PORT> -L 8000:localhost:8000 gpuuser@<HOST>

# 4. Query the model
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Deploy Ollama on a GPU Instance

SSH into your GPU instance and run Ollama for one-command model downloads and serving.

# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>

# 2. Run the setup script (installs to ~/bin, models to ~/.ollama)
./setup-ollama.sh llama3

# Or install manually:
mkdir -p ~/bin
curl -fsSL https://github.com/ollama/ollama/releases/latest/download/ollama-linux-amd64.tar.zst | tar --zstd -x -C /tmp
cp /tmp/bin/ollama ~/bin/ollama && chmod +x ~/bin/ollama
ollama serve &
ollama pull llama3
ollama run llama3

# 3. From another terminal, forward port 11434 via SSH
ssh -p <PORT> -L 11434:localhost:11434 gpuuser@<HOST>

# 4. Query the model via API
curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Deploy SGLang on a GPU Instance

SSH into your GPU instance and run SGLang for high-throughput LLM serving with RadixAttention.

# 1. SSH into your instance
ssh -p <PORT> gpuuser@<HOST>

# 2. Run the setup script (creates venv, installs SGLang)
./setup-sglang.sh meta-llama/Llama-3.1-8B-Instruct

# Or install manually:
python3 -m venv ~/.venv/sglang
source ~/.venv/sglang/bin/activate
pip install "sglang[all]"
python -m sglang.launch_server \
  --model-path meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 --port 30000

# 3. From another terminal, forward port 30000 via SSH
ssh -p <PORT> -L 30000:localhost:30000 gpuuser@<HOST>

# 4. Query the model (OpenAI-compatible)
curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Token Management

Check balance and manage prepaid token credits.

# Check balance
curl /api/v1/tokens/balance \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Purchase tokens
curl -X POST /api/v1/tokens/purchase \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"amount": 1000000}'

# View usage history
curl /api/v1/tokens/usage \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Embeddings

Generate text embeddings for semantic search, clustering, and classification.

curl /v1/embeddings \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": "GPU computing enables massive parallelism"
  }'

# Batch embeddings — pass an array of strings
curl /v1/embeddings \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": [
      "First document to embed",
      "Second document to embed",
      "Third document to embed"
    ]
  }'

Model Deployments

Deploy and manage LLM models on the platform. Models are served via KubeAI with automatic scaling.

# List your model deployments
curl /api/v1/models \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Deploy a new model
curl -X POST /api/v1/models \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-llama",
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "engine": "vllm",
    "minReplicas": 0,
    "maxReplicas": 2,
    "resourceProfile": "nvidia-gpu-dedicated",
    "huggingfaceToken": "hf_..."
  }'

# Scale a deployment
curl -X PUT /api/v1/models/my-llama/scaling \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"minReplicas": 1, "maxReplicas": 3}'

# Stop a model (scale to zero)
curl -X POST /api/v1/models/my-llama/stop \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

GPU Functions (Serverless)

Run GPU workloads as serverless functions. Submit a container image and entrypoint, invoke on demand — billed per execution.

# Create a GPU function
curl -X POST /api/v1/functions \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "image-classifier",
    "image": "registry.gpu.local/myteam/classifier:v1",
    "entrypoint": "python classify.py",
    "gpuTier": "timesliced",
    "timeoutSeconds": 300,
    "envVars": {"MODEL_PATH": "/models/resnet50.pt"}
  }'

# Run the function
curl -X POST /api/v1/functions/fn-abc12345/run \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "image_url": "https://example.com/photo.jpg",
      "top_k": 5
    }
  }'
# Returns 202 Accepted with run ID

# Check run status
curl /api/v1/functions/fn-abc12345/runs/run-def67890 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# List all functions
curl /api/v1/functions \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# List runs for a function
curl /api/v1/functions/fn-abc12345/runs \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete a function
curl -X DELETE /api/v1/functions/fn-abc12345 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Fine-tuning

Fine-tune LLMs with your own data using an OpenAI-compatible API. Supports LoRA and QLoRA methods. Upload training data as JSONL, create a job, and monitor progress.

# 1. Upload a training file (JSONL format)
curl -X POST /v1/files \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -F "file=@training_data.jsonl" \
  -F "purpose=fine-tune"

# Training file format (each line):
# {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

# 2. Create a fine-tuning job
curl -X POST /v1/fine-tuning/jobs \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "training_file": "file-abc123",
    "method": "lora",
    "hyperparameters": {
      "lora_rank": 16,
      "learning_rate": "2e-4",
      "n_epochs": 3,
      "batch_size": 4
    }
  }'

# 3. Check job status
curl /v1/fine-tuning/jobs/ftjob-xyz789 \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

# 4. Stream training events (loss curve)
curl /v1/fine-tuning/jobs/ftjob-xyz789/events \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

# 5. Use the fine-tuned model
curl /v1/chat/completions \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ft:llama-3.1-8b:my-lora:ftjob-xyz789",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Cancel a running job
curl -X DELETE /v1/fine-tuning/jobs/ftjob-xyz789 \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

# List all training files
curl /v1/files \
  -H "Authorization: Bearer sk-ic-YOUR_API_KEY"

Virtual Machines

Create and manage full virtual machines with GPU passthrough, networking, and persistent storage via KubeVirt.

# List VM templates (public)
curl /api/v1/vm-templates

# Create a VM
curl -X POST /api/v1/vms \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ml-workstation",
    "templateId": "tpl-ubuntu-gpu",
    "cpu": 8,
    "memoryGb": 32,
    "diskGb": 100,
    "sshKey": "ssh-ed25519 AAAA..."
  }'

# List your VMs
curl /api/v1/vms \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Stop / Start / Reboot
curl -X POST /api/v1/vms/vm-abc123/stop \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

curl -X POST /api/v1/vms/vm-abc123/start \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

curl -X POST /api/v1/vms/vm-abc123/reboot \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Manage VM networks
curl -X POST /api/v1/vms/vm-abc123/networks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "training-net", "cidr": "10.100.0.0/24"}'

# Delete a VM
curl -X DELETE /api/v1/vms/vm-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Kubernetes Clusters

Provision isolated virtual Kubernetes clusters (vCluster) with GPU access. Each cluster gets its own control plane, kubeconfig, and resource isolation.

# Create a cluster
curl -X POST /api/v1/clusters \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-cluster",
    "version": "1.31"
  }'

# List clusters
curl /api/v1/clusters \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Download kubeconfig
curl /api/v1/clusters/cls-abc123/kubeconfig \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -o kubeconfig.yaml

# Use the cluster
export KUBECONFIG=kubeconfig.yaml
kubectl get nodes
kubectl run gpu-test --image=nvidia/cuda:12.8.0-base-ubuntu24.04 \
  --limits=nvidia.com/gpu=1 -- nvidia-smi

# Delete the cluster
curl -X DELETE /api/v1/clusters/cls-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Webhooks

Receive real-time HTTP callbacks when events occur. Deliveries are signed with HMAC-SHA256 and retried with exponential backoff.

# Create a webhook endpoint
curl -X POST /api/v1/webhooks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.example.com/webhook",
    "events": [
      "instance.created",
      "instance.terminated",
      "balance.low",
      "function.completed",
      "fine_tuning.completed"
    ]
  }'

# List webhook endpoints
curl /api/v1/webhooks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Test a webhook (sends a test delivery)
curl -X POST /api/v1/webhooks/wh-abc123/test \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# View delivery history
curl /api/v1/webhooks/wh-abc123/deliveries \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete a webhook
curl -X DELETE /api/v1/webhooks/wh-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Available events:
# instance.created, instance.started, instance.stopped,
# instance.terminated, instance.error
# vm.created, vm.started, vm.stopped, vm.terminated, vm.error
# balance.low, balance.depleted
# key.created, key.deleted, key.expired
# function.completed, function.failed
# fine_tuning.started, fine_tuning.completed, fine_tuning.failed

Spending Alerts

Set up balance threshold notifications to avoid unexpected depletion. Alerts fire via webhook or in-app notification.

# Create a spending alert
curl -X POST /api/v1/spending-alerts \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Low balance warning",
    "threshold": 100000,
    "notifyVia": "webhook"
  }'

# List alerts
curl /api/v1/spending-alerts \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Update an alert threshold
curl -X PATCH /api/v1/spending-alerts/alert-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"threshold": 50000}'

# Delete an alert
curl -X DELETE /api/v1/spending-alerts/alert-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

SSH Keys

Manage SSH public keys for GPU instance and VM access. Keys can be organised into groups for team-based access control.

# List your SSH keys
curl /api/v1/ssh-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Upload a new SSH key
curl -X POST /api/v1/ssh-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "work-laptop",
    "public_key": "ssh-ed25519 AAAA... user@host"
  }'

# Generate an SSH key pair (private key returned once)
curl -X POST /api/v1/ssh-keys/generate \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "generated-key"}'

# Delete a key
curl -X DELETE /api/v1/ssh-keys/key-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

SSH Key Groups

Organise SSH keys into named groups for team-based access. Assign a group when creating instances to grant access to all keys in that group.

# Create a key group
curl -X POST /api/v1/ssh-key-groups \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "ml-team", "description": "ML engineering team keys"}'

# Add a key to a group
curl -X POST /api/v1/ssh-key-groups/grp-abc123/keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ssh_key_id": "key-xyz789"}'

# List groups
curl /api/v1/ssh-key-groups \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Remove a key from a group
curl -X DELETE /api/v1/ssh-key-groups/grp-abc123/keys/key-xyz789 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

API Keys

Create and manage API keys for programmatic access. Keys support scoped permissions (read/write) and resource-level access control.

# Create an API key with scoped permissions
curl -X POST /api/v1/api-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "inference-key",
    "scopes": ["instances", "models"],
    "permissions": ["read", "write"],
    "expires_in_days": 90
  }'
# Response includes the full key (shown once):
# {"id": "ak-...", "key": "sk-ic-...", "prefix": "sk-ic-abc1"}

# List your API keys
curl /api/v1/api-keys \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Revoke a key
curl -X DELETE /api/v1/api-keys/ak-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Resource Tags

AWS-compatible key/value tagging for any resource. Use tags to organise, filter, and track resources across your infrastructure.

# Tag a resource
curl -X POST /api/v1/tags \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_type": "instance",
    "resource_id": "inst-abc123",
    "tags": [
      {"key": "environment", "value": "production"},
      {"key": "team", "value": "ml-research"}
    ]
  }'

# List tags for a resource
curl "/api/v1/tags?resource_type=instance&resource_id=inst-abc123" \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Filter resources by tag
curl "/api/v1/instances?tag:environment=production" \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Delete tags
curl -X DELETE /api/v1/tags \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_type": "instance",
    "resource_id": "inst-abc123",
    "tag_keys": ["environment"]
  }'

Batch Inference Jobs

Submit large-scale inference jobs using the OpenAI Batch API. Upload a JSONL file of requests and retrieve results asynchronously.

# Submit a batch job
curl -X POST /api/v1/batch-jobs \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "requests": [
      {"custom_id": "req-1", "messages": [{"role": "user", "content": "Summarise: ..."}]},
      {"custom_id": "req-2", "messages": [{"role": "user", "content": "Translate: ..."}]}
    ]
  }'

# Check job status
curl /api/v1/batch-jobs/batch-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Get results when complete
curl /api/v1/batch-jobs/batch-abc123/results \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# List all batch jobs
curl /api/v1/batch-jobs \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Cancel a running job
curl -X DELETE /api/v1/batch-jobs/batch-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Teams

Organise accounts into teams with role-based access. Roles: owner, admin, member, viewer. Team owners can manage membership and permissions.

# Create a team
curl -X POST /api/v1/teams \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "ML Research", "description": "Machine learning research team"}'

# List all teams in your tenant
curl /api/v1/teams \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Get teams you belong to
curl /api/v1/teams/my \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# Add a member
curl -X POST /api/v1/teams/team-abc123/members \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"account_id": "acc-xyz789", "role": "member"}'

# Update a member's role
curl -X PATCH /api/v1/teams/team-abc123/members/tmem-def456 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"role": "admin"}'

# Remove a member
curl -X DELETE /api/v1/teams/team-abc123/members/tmem-def456 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

Slurm Environments

Create managed Slurm HPC environments for distributed GPU training. Each environment gets its own Slurm control plane with SSH access.

# Create a Slurm environment
curl -X POST /api/v1/slurm-environments \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "distributed-training",
    "max_nodes": 4,
    "gpu_per_node": 1
  }'

# List environments
curl /api/v1/slurm-environments \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

# SSH into the Slurm head node
# ssh -p <ssh_port> gpuuser@<ssh_host>
# Then submit jobs with: sbatch, srun, squeue

# Delete environment
curl -X DELETE /api/v1/slurm-environments/slurm-abc123 \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN"

VM Networks & Disks

Manage private networks, subnets, firewall rules, and persistent block storage for virtual machines.

# Create a private network
curl -X POST /api/v1/vm-networks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "training-net", "description": "Isolated training network"}'

# Add a subnet
curl -X POST /api/v1/vm-networks/net-abc123/subnets \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "workers", "cidr": "10.100.0.0/24"}'

# Add a firewall rule
curl -X POST /api/v1/vm-networks/net-abc123/firewall-rules \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "direction": "ingress",
    "protocol": "tcp",
    "port_range": "22",
    "source": "0.0.0.0/0",
    "action": "allow"
  }'

# Create a persistent disk
curl -X POST /api/v1/vm-disks \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "data-volume", "size_gb": 500}'

# Attach disk to a VM
curl -X POST /api/v1/vm-disks/disk-abc123/attach \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"vm_id": "vm-xyz789"}'

# Resize a disk
curl -X POST /api/v1/vm-disks/disk-abc123/resize \
  -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"size_gb": 1000}'

Python SDK

Official Python SDK with sync and async clients, typed models, pagination, and automatic retries.

# Install the SDK
pip install ic-gpu

# Or from source
git clone https://github.com/your-org/ic-gpu-service.git
cd ic-gpu-service/sdk-python
pip install -e .

TypeScript SDK

Official TypeScript SDK using native fetch. Works in Node.js 18+, Bun, Deno, and browsers.

# Install the SDK
npm install @ic-gpu/client

# Or from source
git clone https://github.com/your-org/ic-gpu-service.git
cd ic-gpu-service/sdk-typescript
npm install && npm run build

Terraform Provider

Manage IC GPU resources as infrastructure-as-code. Supports instances, VMs, clusters, model deployments, API keys, SSH keys, webhooks, and spending alerts.

# Install the provider
# Add to your Terraform configuration:
terraform {
  required_providers {
    icgpu = {
      source = "registry.terraform.io/ic-gpu/icgpu"
    }
  }
}

# Initialise and plan
terraform init
terraform plan
terraform apply

JupyterHub Notebooks

Interactive GPU-accelerated Jupyter notebooks with pre-installed ML frameworks. Authenticate via Keycloak SSO. Choose from CPU, time-sliced GPU, or dedicated GPU profiles.

# JupyterHub is available at:
# https://notebooks.gpu.local

# Login uses Keycloak SSO — same credentials as the dashboard.

# Available server profiles:
# - CPU Only:        2 CPU / 4 GB RAM (lightweight notebooks)
# - GPU Time-sliced: 4 CPU / 16 GB RAM + shared GPU
# - GPU Dedicated:   8 CPU / 24 GB RAM + full GPU

# Pre-installed in the workspace image:
# - Python 3, PyTorch, CUDA 12.8
# - JupyterLab with extensions
# - vLLM, Ollama, SGLang setup scripts
# - IC GPU Python SDK

# Access the platform API from notebooks:
curl http://platform-api.platform.svc.cluster.local:80/v1/models

# Access KubeAI for inference:
curl http://kubeai.kubeai.svc.cluster.local:80/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3-8b", "messages": [{"role": "user", "content": "Hi"}]}'

Response Headers

LLM API responses include token balance information in headers.

HeaderDescription
X-Tokens-RemainingRemaining token balance after this request
X-Tokens-UsedTokens consumed by this request
X-Request-IdUnique request identifier for debugging
X-Webhook-SignatureHMAC-SHA256 signature on webhook deliveries
Idempotency-KeyClient request token for idempotent operations (echoed back)

Error Codes

All errors follow a consistent format with AWS-compatible error codes.

CodeTypeDescription
400ValidationExceptionInvalid request parameters or body
401AuthenticationErrorInvalid or missing API key / OIDC token
402InsufficientTokensToken balance depleted — purchase more tokens
403AccessDeniedExceptionInsufficient permissions for this operation
404ResourceNotFoundExceptionResource not found
409ConflictExceptionResource already exists or state conflict
429ThrottlingExceptionRate limit exceeded — retry after cooldown
500InternalServiceExceptionUnexpected server error
502ServiceUnavailableExceptionUpstream service unavailable — model scaling up

Webhook Events

Subscribe to any combination of events. Use * to receive all events.

EventDescription
instance.createdGPU instance provisioned
instance.startedGPU instance started (pod running)
instance.stoppedGPU instance stopped
instance.terminatedGPU instance deleted
instance.errorGPU instance entered error state
instance.preemptedBatch GPU instance preempted by higher-priority workload
vm.createdVirtual machine provisioned
vm.startedVirtual machine started
vm.stoppedVirtual machine stopped
vm.terminatedVirtual machine deleted
vm.errorVirtual machine entered error state
balance.lowToken balance below spending alert threshold
balance.depletedToken balance reached zero
key.createdAPI key created
key.deletedAPI key deleted
key.expiredAPI key expired
function.completedGPU function run completed successfully
function.failedGPU function run failed
fine_tuning.startedFine-tuning job started training
fine_tuning.completedFine-tuning job completed — model ready
fine_tuning.failedFine-tuning job failed

Terraform Resources

All resources support import via terraform import. Data sources are read-only.

Resource / Data SourceDescription
icgpu_instanceGPU instance lifecycle (create, start, stop, delete)
icgpu_vmVirtual machine lifecycle with GPU passthrough
icgpu_clusterVirtual Kubernetes cluster (vCluster)
icgpu_model_deploymentLLM model deployment with auto-scaling
icgpu_api_keyAPI key for LLM and resource access
icgpu_ssh_keySSH key for instance and VM access
icgpu_webhookWebhook endpoint for event notifications
icgpu_spending_alertBalance threshold alert
data.icgpu_gpu_tiersAvailable GPU tiers and pricing (read-only)
data.icgpu_vm_templatesAvailable VM templates (read-only)
data.icgpu_token_packagesAvailable token packages (read-only)
data.icgpu_model_cataloguePre-tested model catalogue (read-only)

Rate Limits

Rate limits are enforced per account using a sliding window. Exceeding the limit returns 429 ThrottlingException.

Endpoint GroupLimitWindow
Dashboard API (/api/v1/*)120 requests1 minute
LLM API (/v1/*)60 requests1 minute
Traefik ingress (per IP)30 requests avg1 minute