Modifying MS-A1 to run Ollama and Claude Code

1. BIOS settings

Reboot, hit Del to enter BIOS, and set:

UMA Frame Buffer Size: 16G (under Advanced → AMD CBS → NBIO → GFX Config, or similar). This is the single most impactful setting — Ollama keys off reserved VRAM size to decide whether to use the iGPU at all.
IOMMU: Enabled
SVM Mode (virtualization): Enabled (useful if you’ll ever run containers/VMs)
Resizable BAR: Enabled

2. Install ROCm

On Ubuntu 24.04, use AMD’s amdgpu-install utility rather than distro packages — it’s the only reliable path:

sudo apt update && sudo apt install -y wget gnupg

wget https://repo.radeon.com/amdgpu-install/6.4/ubuntu/noble/amdgpu-install_6.4.60400-1_all.deb

sudo apt install -y ./amdgpu-install_6.4.60400-1_all.deb

sudo amdgpu-install –usecase=rocm –no-dkms

sudo usermod -aG render,video $USER

# reboot

After reboot, verify: rocminfo | grep gfx should show gfx1103.

3. Install Ollama (v0.14+ for Claude Code compatibility)

curl -fsSL https://ollama.com/install.sh | sh

Confirm with ollama –version — you want 0.14 or higher. Stable Ollama has issues with streaming tool calls that break Claude Code’s agentic loop. Use a recent pre-release (0.14.3-rc1 or later) until these fixes land in stable, so if you hit issues with Claude Code’s tool calling, install the pre-release explicitly:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

4. Configure the Ollama systemd service

Edit the service with sudo systemctl edit ollama.service and add:

[Service]

Environment=”HSA_OVERRIDE_GFX_VERSION=11.0.2″

Environment=”HCC_AMDGPU_TARGET=gfx1103″

Environment=”OLLAMA_HOST=127.0.0.1:11434″

Environment=”OLLAMA_FLASH_ATTENTION=1″

Environment=”OLLAMA_KV_CACHE_TYPE=q8_0″

Environment=”OLLAMA_CONTEXT_LENGTH=32768″

Environment=”OLLAMA_NUM_PARALLEL=1″

Environment=”OLLAMA_MAX_LOADED_MODELS=1″

Environment=”OLLAMA_KEEP_ALIVE=30m”

Environment=”OLLAMA_GPU_OVERHEAD=0″

Two critical points here for Claude Code specifically:

OLLAMA_CONTEXT_LENGTH=32768 — the Ollama docs say it is recommended to run a model with at least 32K tokens context length for Claude Code, and their Claude Code integration page pushes even higher: Claude Code requires a large context window. We recommend at least 64k tokens. 32K is a reasonable starting point on your hardware — the KV cache at 64K starts eating significant VRAM. You can bump to 65536 if you find Claude Code complaining about truncation, but expect to drop to a smaller model to compensate.
OLLAMA_NUM_PARALLEL=1 — critical on iGPUs. Each parallel slot multiplies the KV cache. Keep it at 1.

Then reload and restart:

sudo systemctl daemon-reload

sudo systemctl restart ollama

sudo systemctl enable ollama

5. Pull the right models

For Claude Code agentic work you need tool-calling capable models. Good picks for your hardware:

ollama pull qwen3-coder:30b # MoE, ~3B active — great speed/quality

ollama pull qwen2.5-coder:14b # Dense, excellent coder, fits cleanly in 16GB

ollama pull glm-4.6:9b # Good tool-calling, lighter

qwen3-coder:30b is the sweet spot — it’s MoE so most params are dormant, meaning you get 30B-quality output at roughly 14B-model speeds. Should comfortably hit 10+ tok/s on your rig with the full model in memory (mix of VRAM + GTT spillover into your 96GB pool).

6. Install Claude Code and point it at Ollama

npm install -g @anthropic-ai/claude-code

Add to ~/.bashrc or ~/.zshrc:

export ANTHROPIC_AUTH_TOKEN=”ollama”

export ANTHROPIC_API_KEY=””

export ANTHROPIC_BASE_URL=”http://localhost:11434″

Launch with: claude –model qwen3-coder:30b

Realistic expectations

One thing worth being upfront about: local coding agents are much slower than cloud Claude, even on good hardware. A recent real-world benchmark from someone doing this exact setup: I ran the same codebase investigation task with cloud Claude and local GLM-4.7 via Ollama… 1m 13s (cloud) vs 1h 22m (local). 68x slower. But the outputs were nearly identical. That was on beefier hardware than yours.

Your 780M will be slower still. The practical framing: local Ollama + Claude Code is excellent for privacy-sensitive work, offline coding, overnight batch tasks, and learning — not a replacement for cloud Claude on interactive tight-loop work. Many people run both: cloud Claude Code subscription for the daily driver, local setup for sensitive repos.

Quick verification after setup

# Confirm GPU detected

ollama ps # run after a query — should show 100% GPU for small models

# Check it’s actually using the iGPU

journalctl -u ollama -f | grep -i “rocm\|gpu”

# Basic Claude Code smoke test

mkdir ~/claude-test && cd ~/claude-test

claude –model qwen3-coder:30b

# > “create a hello.py that prints the current date”

If ollama ps shows CPU processing instead of GPU, the HSA_OVERRIDE value is the first thing to change — try 11.0.0 instead of 11.0.2. A few users on the same chip report one or the other works better depending on ROCm version.

Modifying MS-A1 to run Ollama and Claude Code

Published by Mr K

Leave a comment Cancel reply

Share this:

Related

Published by Mr K

Leave a comment Cancel reply