1. BIOS settings
Reboot, hit Del to enter BIOS, and set:
- UMA Frame Buffer Size: 16G (under Advanced → AMD CBS → NBIO → GFX Config, or similar). This is the single most impactful setting — Ollama keys off reserved VRAM size to decide whether to use the iGPU at all.
- IOMMU: Enabled
- SVM Mode (virtualization): Enabled (useful if you’ll ever run containers/VMs)
- Resizable BAR: Enabled
2. Install ROCm
On Ubuntu 24.04, use AMD’s amdgpu-install utility rather than distro packages — it’s the only reliable path:
sudo apt update && sudo apt install -y wget gnupg
wget https://repo.radeon.com/amdgpu-install/6.4/ubuntu/noble/amdgpu-install_6.4.60400-1_all.deb
sudo apt install -y ./amdgpu-install_6.4.60400-1_all.deb
sudo amdgpu-install –usecase=rocm –no-dkms
sudo usermod -aG render,video $USER
# reboot
After reboot, verify: rocminfo | grep gfx should show gfx1103.
3. Install Ollama (v0.14+ for Claude Code compatibility)
curl -fsSL https://ollama.com/install.sh | sh
Confirm with ollama –version — you want 0.14 or higher. Stable Ollama has issues with streaming tool calls that break Claude Code’s agentic loop. Use a recent pre-release (0.14.3-rc1 or later) until these fixes land in stable, so if you hit issues with Claude Code’s tool calling, install the pre-release explicitly:
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh
4. Configure the Ollama systemd service
Edit the service with sudo systemctl edit ollama.service and add:
[Service]
Environment=”HSA_OVERRIDE_GFX_VERSION=11.0.2″
Environment=”HCC_AMDGPU_TARGET=gfx1103″
Environment=”OLLAMA_HOST=127.0.0.1:11434″
Environment=”OLLAMA_FLASH_ATTENTION=1″
Environment=”OLLAMA_KV_CACHE_TYPE=q8_0″
Environment=”OLLAMA_CONTEXT_LENGTH=32768″
Environment=”OLLAMA_NUM_PARALLEL=1″
Environment=”OLLAMA_MAX_LOADED_MODELS=1″
Environment=”OLLAMA_KEEP_ALIVE=30m”
Environment=”OLLAMA_GPU_OVERHEAD=0″
Two critical points here for Claude Code specifically:
- OLLAMA_CONTEXT_LENGTH=32768 — the Ollama docs say it is recommended to run a model with at least 32K tokens context length for Claude Code, and their Claude Code integration page pushes even higher: Claude Code requires a large context window. We recommend at least 64k tokens. 32K is a reasonable starting point on your hardware — the KV cache at 64K starts eating significant VRAM. You can bump to 65536 if you find Claude Code complaining about truncation, but expect to drop to a smaller model to compensate.
- OLLAMA_NUM_PARALLEL=1 — critical on iGPUs. Each parallel slot multiplies the KV cache. Keep it at 1.
Then reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl enable ollama
5. Pull the right models
For Claude Code agentic work you need tool-calling capable models. Good picks for your hardware:
ollama pull qwen3-coder:30b # MoE, ~3B active — great speed/quality
ollama pull qwen2.5-coder:14b # Dense, excellent coder, fits cleanly in 16GB
ollama pull glm-4.6:9b # Good tool-calling, lighter
qwen3-coder:30b is the sweet spot — it’s MoE so most params are dormant, meaning you get 30B-quality output at roughly 14B-model speeds. Should comfortably hit 10+ tok/s on your rig with the full model in memory (mix of VRAM + GTT spillover into your 96GB pool).
6. Install Claude Code and point it at Ollama
npm install -g @anthropic-ai/claude-code
Add to ~/.bashrc or ~/.zshrc:
export ANTHROPIC_AUTH_TOKEN=”ollama”
export ANTHROPIC_API_KEY=””
export ANTHROPIC_BASE_URL=”http://localhost:11434″
Launch with: claude –model qwen3-coder:30b
Realistic expectations
One thing worth being upfront about: local coding agents are much slower than cloud Claude, even on good hardware. A recent real-world benchmark from someone doing this exact setup: I ran the same codebase investigation task with cloud Claude and local GLM-4.7 via Ollama… 1m 13s (cloud) vs 1h 22m (local). 68x slower. But the outputs were nearly identical. That was on beefier hardware than yours.
Your 780M will be slower still. The practical framing: local Ollama + Claude Code is excellent for privacy-sensitive work, offline coding, overnight batch tasks, and learning — not a replacement for cloud Claude on interactive tight-loop work. Many people run both: cloud Claude Code subscription for the daily driver, local setup for sensitive repos.
Quick verification after setup
# Confirm GPU detected
ollama ps # run after a query — should show 100% GPU for small models
# Check it’s actually using the iGPU
journalctl -u ollama -f | grep -i “rocm\|gpu”
# Basic Claude Code smoke test
mkdir ~/claude-test && cd ~/claude-test
claude –model qwen3-coder:30b
# > “create a hello.py that prints the current date”
If ollama ps shows CPU processing instead of GPU, the HSA_OVERRIDE value is the first thing to change — try 11.0.0 instead of 11.0.2. A few users on the same chip report one or the other works better depending on ROCm version.
