Cairn

CAN I RUN AI LOCALLY? · A FIELD GUIDE

Can I run AI locally?

Looking at your machine…

Cairn reads your GPU, VRAM, and bandwidth from the browser, then ranks 50+ open-weight LLMs against your hardware. Offline, in about 300 ms.

— looking around your machine —

What your GPU can actually run

6 GB of VRAM runs a 7B model at Q4 quantization. 12 GB covers most 13B. 24 GB opens up 30B and MoE 70B. Cairn checks 50+ open-weight LLMs against your hardware so you know what fits before you pull a 40 GB checkpoint.

Want the full picture? Check the tier list, or put two GPUs side-by-side.

Fit
Task
Provider
License

All 50 open-weight LLMs

50 models

Llama 3.1 8B

Meta

5.4GB0%
8B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeLlama 3.1Dense

Llama 3.1 70B

Meta

42.0GB0%
70B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningLlama 3.1Dense

Llama 3.3 70B

Meta

42.0GB0%
70B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningLlama 3.3Dense

Qwen 3 0.6B

Alibaba

0.6GB0%
0.6B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatApache 2.0Dense

Qwen 3 4B

Alibaba

3.2GB0%
4B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeApache 2.0Dense

Qwen 3 8B

Alibaba

5.4GB0%
8B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0Dense

Qwen 3 32B

Alibaba

20.0GB0%
32B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0Dense

Qwen 3 235B (MoE)

Alibaba

142GB0%
235B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0MoE

DeepSeek R1 Distill 7B

DeepSeek

4.7GB0%
7B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatReasoningMITDense

DeepSeek R1 Distill 70B

DeepSeek

42.0GB0%
70B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatReasoningMITDense

DeepSeek V3

DeepSeek

400GB0%
685B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningMITMoE

Gemma 3 4B

Google

3.2GB0%
4B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatVisionGemmaDense

Gemma 3 12B

Google

8.2GB0%
12B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatVisionGemmaDense

Gemma 3 27B

Google

17.0GB0%
27B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeVisionGemmaDense

Mistral Small 24B

Mistral AI

15.0GB0%
24B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeApache 2.0Dense

Phi-4 14B

Microsoft

9.5GB0%
14B params16.384K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningMITDense

Code Llama 7B

Meta

4.7GB0%
7B params16.384K contextExceeds your VRAM — won't run on GPU alone
CodeLlama 2Dense

LLaVA 1.6 7B

LLaVA Team

4.7GB0%
7B params4.096K contextExceeds your VRAM — won't run on GPU alone
ChatVisionApache 2.0Dense

Mixtral 8x7B

Mistral AI

28.0GB0%
47B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeApache 2.0MoE

Qwen 2.5 Coder 32B

Alibaba

20.0GB0%
32B params131.072K contextExceeds your VRAM — won't run on GPU alone
CodeApache 2.0Dense

Llama 3.2 1B

Meta

0.7GB0%
1B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatLlama 3.2Dense

Llama 3.2 3B

Meta

2.0GB0%
3B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatLlama 3.2Dense

Llama 3.2 11B Vision

Meta

7.2GB0%
11B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatVisionLlama 3.2Dense

Llama 3.1 405B

Meta

263GB0%
405B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningLlama 3.1Dense

Llama 4 Scout 17B

Meta

71.0GB0%
109B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningVisionLlama 4MoE

Llama 4 Maverick 17B-128E

Meta

260GB0%
400B params1.048576M contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningVisionLlama 4MoE

GPT-OSS 20B

OpenAI

14.0GB0%
21B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0MoE

GPT-OSS 120B

OpenAI

76.0GB0%
117B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0MoE

Mistral Small 3.1 24B

Mistral AI

16.0GB0%
24B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeVisionApache 2.0Dense

Mistral Nemo 12B

Mistral AI

7.8GB0%
12B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeApache 2.0Dense

Mixtral 8x22B

Mistral AI

92.0GB0%
141B params65.536K contextExceeds your VRAM — won't run on GPU alone
ChatCodeApache 2.0MoE

Devstral 2 123B

Mistral AI

80.0GB0%
123B params262.144K contextExceeds your VRAM — won't run on GPU alone
CodeReasoningMistral ResearchDense

Qwen 3.5 0.8B

Alibaba

0.6GB0%
0.8B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatApache 2.0Dense

Qwen 3.5 9B

Alibaba

5.9GB0%
9B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0Dense

Qwen 3 1.7B

Alibaba

1.1GB0%
1.7B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatApache 2.0Dense

Qwen 3 14B

Alibaba

9.1GB0%
14B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0Dense

Qwen 3 30B-A3B (MoE)

Alibaba

20.0GB0%
30B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningApache 2.0MoE

Qwen 3 Coder 480B (MoE)

Alibaba

312GB0%
480B params262.144K contextExceeds your VRAM — won't run on GPU alone
CodeReasoningApache 2.0MoE

Qwen 2.5 72B

Alibaba

47.0GB0%
72B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningQwenDense

Qwen 2.5 Coder 7B

Alibaba

4.6GB0%
7B params131.072K contextExceeds your VRAM — won't run on GPU alone
CodeApache 2.0Dense

DeepSeek R1 Distill 1.5B

DeepSeek

1.0GB0%
1.5B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatReasoningMITDense

DeepSeek R1 Distill 14B

DeepSeek

9.1GB0%
14B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatReasoningMITDense

DeepSeek R1 Distill 32B

DeepSeek

21.0GB0%
32B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatReasoningMITDense

DeepSeek R1 671B

DeepSeek

436GB0%
671B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningMITMoE

DeepSeek V3.1

DeepSeek

436GB0%
671B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningMITMoE

DeepSeek V3.2

DeepSeek

445GB0%
685B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningMITMoE

Kimi K2

Moonshot AI

650GB0%
1000B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeReasoningKimiMoE

Gemma 3 1B

Google

0.7GB0%
1B params32.768K contextExceeds your VRAM — won't run on GPU alone
ChatGemmaDense

Gemma 2 9B

Google

5.9GB0%
9B params8.192K contextExceeds your VRAM — won't run on GPU alone
ChatGemmaDense

Phi-3.5 Mini 3.8B

Microsoft

2.5GB0%
3.8B params131.072K contextExceeds your VRAM — won't run on GPU alone
ChatCodeMITDense

Common questions about running AI locally

What's the minimum GPU VRAM to run AI models locally?

At 6 GB of VRAM, stick to 7B-parameter models at Q4 quantization. 12 GB covers most 13B models. 24 GB opens up 30B dense models and MoE 70B Q4. You'll want 48 GB+ for 70B at Q8.

Is local LLM inference faster than the ChatGPT API?

Speed comes down to your GPU's memory bandwidth. An RTX 4090 runs a 7B model at 80+ tokens per second — about as fast as an API response, minus the network trip.

Can I run AI locally on Windows, Mac, or Linux?

All three work. Cairn reads your GPU via WebGPU / WebGL — the inference itself runs in llama.cpp, Ollama, LM Studio, or whatever local runtime you prefer. Model support is the same across operating systems.

What's the difference between Q4_K_M and Q8_0 quantization for local LLMs?

Q4_K_M uses ~0.65 GB per billion parameters with ~1% quality loss vs full precision. Q8_0 doubles the VRAM but keeps ~99.9% quality. Q4 is the default for most consumer GPUs.