What do S, B, and F mean in this tier list?
S means the model uses under 85% of your VRAM — comfortable, with headroom for longer contexts. B is 85–100% — it loads but you're close to the edge. F exceeds 100% and can't be fully loaded onto your GPU.
Every model, sorted by what your machine can do with it.
Can I run AI locally — and which models exactly? Here's the tier list per GPU. S fits comfortably, B barely fits, F won't fit at all — useful for picking a target model before you download a 40 GB checkpoint.
— pick your GPU to see the ladder —
S means the model uses under 85% of your VRAM — comfortable, with headroom for longer contexts. B is 85–100% — it loads but you're close to the edge. F exceeds 100% and can't be fully loaded onto your GPU.
VRAM capacity. An RTX 4090 has 24 GB — a 70B model at Q4 needs ~42 GB, so it overflows. A Mac Studio M2 Ultra has 192 GB unified memory, so the same model has massive headroom.
MoE still needs all experts in VRAM even though only some activate per token. A 47B-active MoE like Mixtral 8x7B uses the full 47 GB (not the 13B active-parameter number), so it tiers like a 47B dense model for fit but runs closer to 13B for speed.