Question 1

What do S, B, and F mean in this tier list?

Accepted Answer

S means the model uses under 85% of your VRAM — comfortable, with headroom for longer contexts. B is 85–100% — it loads but you're close to the edge. F exceeds 100% and can't be fully loaded onto your GPU.

Question 2

Why is a 70B model F-tier on an RTX 4090 but S-tier on a Mac Studio?

Accepted Answer

VRAM capacity. An RTX 4090 has 24 GB — a 70B model at Q4 needs ~42 GB, so it overflows. A Mac Studio M2 Ultra has 192 GB unified memory, so the same model has massive headroom.

Question 3

How do MoE models rank differently from dense models?

Accepted Answer

MoE still needs all experts in VRAM even though only some activate per token. A 47B-active MoE like Mixtral 8x7B uses the full 47 GB (not the 13B active-parameter number), so it tiers like a 47B dense model for fit but runs closer to 13B for speed.

Local LLM tier list by GPU

Common questions about the local LLM tier list

What do S, B, and F mean in this tier list?

Why is a 70B model F-tier on an RTX 4090 but S-tier on a Mac Studio?

How do MoE models rank differently from dense models?