Cairn

Local LLM tier list by GPU

Every model, sorted by what your machine can do with it.

Can I run AI locally — and which models exactly? Here's the tier list per GPU. S fits comfortably, B barely fits, F won't fit at all — useful for picking a target model before you download a 40 GB checkpoint.

— pick your GPU to see the ladder —

Common questions about the local LLM tier list

What do S, B, and F mean in this tier list?

S means the model uses under 85% of your VRAM — comfortable, with headroom for longer contexts. B is 85–100% — it loads but you're close to the edge. F exceeds 100% and can't be fully loaded onto your GPU.

Why is a 70B model F-tier on an RTX 4090 but S-tier on a Mac Studio?

VRAM capacity. An RTX 4090 has 24 GB — a 70B model at Q4 needs ~42 GB, so it overflows. A Mac Studio M2 Ultra has 192 GB unified memory, so the same model has massive headroom.

How do MoE models rank differently from dense models?

MoE still needs all experts in VRAM even though only some activate per token. A 47B-active MoE like Mixtral 8x7B uses the full 47 GB (not the 13B active-parameter number), so it tiers like a 47B dense model for fit but runs closer to 13B for speed.