What is Llama 4 Scout and what hardware does it need?

Llama 4 Scout is Meta's 17-billion-parameter model designed for local deployment. It runs on a single consumer GPU with 24 GB of VRAM, such as the Nvidia RTX 4090, making it accessible to individual developers and small teams without cloud infrastructure.

How does Llama 4 Maverick compare to GPT-5.4 and Claude?

On benchmark tests, Llama 4 Maverick scored 87.4 on MMLU (vs. 86.1 for GPT-5.4 and 85.8 for Claude 3.7 Sonnet) and 79.6 on the MATH benchmark, making it competitive with or above closed-source frontier models on several reasoning tasks.

Is Llama 4 free to use commercially?

Yes. Llama 4 is free to download and use commercially under Meta's open-use license. The main restriction is for organizations with more than 700 million monthly active users, which effectively limits only the largest tech companies.

Meta Just Dropped Llama 4 — and the Open-Sour…

Llama 4 Scout and Llama 4 Maverick are available now. Scout runs on a single consumer GPU. Maverick benchmarks above GPT-5.4 mini on reasoning tasks. Both are free to download.

Meta released Llama 4 on Sunday, and the AI community has been going through it with the kind of forensic intensity usually reserved for leaked court documents. Two model variants dropped simultaneously: Llama 4 Scout, a 17-billion-parameter model optimized for local deployment, and Llama 4 Maverick, a 400-billion-parameter mixture-of-experts model designed for enterprise workloads. Both are free to download under Meta's open-use license, which permits commercial deployment for organizations with fewer than 700 million monthly active users — a threshold that exempts pretty much everyone except Meta itself and a handful of other tech giants.

The Scout headline is that it runs on a single Nvidia RTX 4090 or equivalent consumer GPU with 24 GB of VRAM. That has been a threshold point developers have been waiting for: a genuinely capable model that a solo developer or small team can run entirely on their own hardware, without paying cloud inference costs, without sending their data to anyone's server. In early benchmarks circulating on the developer forums, Scout is scoring on par with GPT-5.4 mini and Gemini 2.0 Flash on standard coding and reasoning tasks, and slightly above both on structured data extraction.

Maverick is a different animal. The 400B parameter count sounds large, but the mixture-of-experts architecture means only a fraction of the parameters are active during any given inference pass — about 17 billion, roughly equivalent to Scout's full size. The practical result is that Maverick requires less compute per query than a dense 400B model would, while retaining the knowledge and reasoning depth of a much larger network. On the MMLU benchmark, Maverick scored 87.4, compared to 86.1 for GPT-5.4 and 85.8 for Claude 3.7 Sonnet. On math reasoning (MATH benchmark), Maverick hit 79.6, which is meaningfully above any open-source model released before it.

Continue reading to see the full article

Meta Just Dropped Llama 4 — and the Open-Source AI Race Is On Again

Key Takeaways

Frequently Asked Questions

Meta Just Dropped Llama 4 — and the Open-Source AI Race Is On Again

Key Takeaways

Frequently Asked Questions

More in Tech & AI