Cairn

Compare devices for local AI

Pick two devices. See the differences on one page.

Can I run AI locally on this device — or that one? The answer comes down to two numbers: VRAM and memory bandwidth. Pick two devices below and Cairn runs the same 50 open-weight models against each, showing which fits, at what quantization, and how fast.

Pick your two devices

Device A

Device B

Kimi K2

1000B · Moonshot AI

DeepSeek V3

685B · DeepSeek

DeepSeek V3.2

685B · DeepSeek

DeepSeek R1 671B

671B · DeepSeek

DeepSeek V3.1

671B · DeepSeek

Qwen 3 Coder 480B (MoE)

480B · Alibaba

Llama 3.1 405B

405B · Meta

Llama 4 Maverick 17B-128E

400B · Meta

Qwen 3 235B (MoE)

235B · Alibaba

Mixtral 8x22B

141B · Mistral AI

Devstral 2 123B

123B · Mistral AI

GPT-OSS 120B

117B · OpenAI

Llama 4 Scout 17B

109B · Meta

Qwen 2.5 72B

72B · Alibaba

Llama 3.1 70B

70B · Meta

Llama 3.3 70B

70B · Meta

DeepSeek R1 Distill 70B

70B · DeepSeek

Mixtral 8x7B

47B · Mistral AI

Qwen 3 32B

32B · Alibaba

Qwen 2.5 Coder 32B

32B · Alibaba

DeepSeek R1 Distill 32B

32B · DeepSeek

Qwen 3 30B-A3B (MoE)

30B · Alibaba

Gemma 3 27B

27B · Google

Mistral Small 24B

24B · Mistral AI

Mistral Small 3.1 24B

24B · Mistral AI

GPT-OSS 20B

21B · OpenAI

Phi-4 14B

14B · Microsoft

Qwen 3 14B

14B · Alibaba

DeepSeek R1 Distill 14B

14B · DeepSeek

Gemma 3 12B

12B · Google

Mistral Nemo 12B

12B · Mistral AI

Llama 3.2 11B Vision

11B · Meta

Qwen 3.5 9B

9B · Alibaba

Gemma 2 9B

9B · Google

Llama 3.1 8B

8B · Meta

Qwen 3 8B

8B · Alibaba

DeepSeek R1 Distill 7B

7B · DeepSeek

Code Llama 7B

7B · Meta

LLaVA 1.6 7B

7B · LLaVA Team

Qwen 2.5 Coder 7B

7B · Alibaba

Qwen 3 4B

4B · Alibaba

Gemma 3 4B

4B · Google

Phi-3.5 Mini 3.8B

3.8B · Microsoft

Llama 3.2 3B

3B · Meta

Qwen 3 1.7B

1.7B · Alibaba

DeepSeek R1 Distill 1.5B

1.5B · DeepSeek

Llama 3.2 1B

1B · Meta

Gemma 3 1B

1B · Google

Qwen 3.5 0.8B

0.8B · Alibaba

Qwen 3 0.6B

0.6B · Alibaba

Common questions about choosing a device for local AI

What device do I need to run AI locally?

6 GB of VRAM is the floor — enough for 7B models at Q4 quantization. 12 GB covers most 13B models; 24 GB opens up 30B; 70B needs 48 GB+ or a unified-memory setup like Mac Studio. NVIDIA RTX (discrete VRAM), Apple Silicon (unified memory), and AMD Radeon all work.

What's the best device for local AI — Apple Silicon or NVIDIA RTX?

Depends on what you want. Apple wins on capacity: Mac Studio unified memory goes up to 192 GB. NVIDIA wins on bandwidth: an RTX 4090 is ~1 TB/s vs the M2 Pro's 200 GB/s. Bigger models fit on Apple; NVIDIA runs them faster.

Does VRAM or memory bandwidth matter more for local inference?

Both, but they answer different questions. VRAM decides if a model loads. Bandwidth decides how fast it runs. An RTX 4090 has 24 GB VRAM and ~1 TB/s bandwidth — a 7B model fits in 5 GB and runs at 80+ tokens/sec. An Apple M2 Pro has 16 GB unified memory at 200 GB/s — same model fits but runs around 20 tok/sec.

Why does the same 70B model fit on one device but not another with the same VRAM?

Quantization. At Q4_K_M a 70B model needs ~42 GB; at Q8_0 it needs ~75 GB. Two 48 GB cards will both fit Q4, but context length and activation memory can push Q8 past the edge on one of them.