Meta released Llama 4 on Sunday, and the AI community has been going through it with the kind of forensic intensity usually reserved for leaked court documents. Two model variants dropped simultaneously: Llama 4 Scout, a 17-billion-parameter model optimized for local deployment, and Llama 4 Maverick, a 400-billion-parameter mixture-of-experts model designed for enterprise workloads. Both are free to download under Meta's open-use license, which permits commercial deployment for organizations with fewer than 700 million monthly active users — a threshold that exempts pretty much everyone except Meta itself and a handful of other tech giants.
The Scout headline is that it runs on a single Nvidia RTX 4090 or equivalent consumer GPU with 24 GB of VRAM. That has been a threshold point developers have been waiting for: a genuinely capable model that a solo developer or small team can run entirely on their own hardware, without paying cloud inference costs, without sending their data to anyone's server. In early benchmarks circulating on the developer forums, Scout is scoring on par with GPT-5.4 mini and Gemini 2.0 Flash on standard coding and reasoning tasks, and slightly above both on structured data extraction.
Maverick is a different animal. The 400B parameter count sounds large, but the mixture-of-experts architecture means only a fraction of the parameters are active during any given inference pass — about 17 billion, roughly equivalent to Scout's full size. The practical result is that Maverick requires less compute per query than a dense 400B model would, while retaining the knowledge and reasoning depth of a much larger network. On the MMLU benchmark, Maverick scored 87.4, compared to 86.1 for GPT-5.4 and 85.8 for Claude 3.7 Sonnet. On math reasoning (MATH benchmark), Maverick hit 79.6, which is meaningfully above any open-source model released before it.
“On the MMLU benchmark, Maverick scored 87.4, compared to 86.1 for GPT-5.4 and 85.8 for Claude 3.7 Sonnet.”
The reception from developers has been loud and largely positive, though with some caveats. Llama 4's context window is 256,000 tokens on Scout and 1 million on Maverick — competitive with the frontier, but the community will spend the next week pressure-testing whether performance degrades on long-context tasks the way it has with prior Llama versions. Several AI researchers on X noted that the Maverick benchmark results were produced with a "chat-tuned" variant different from the base model, which may affect reproducibility.
Key Takeaways
- →meta: Llama 4 Scout is Meta's 17-billion-parameter model designed for local deployment.
- →llama 4: Llama 4 Scout is Meta's 17-billion-parameter model designed for local deployment.
- →open source ai: Llama 4 Scout is Meta's 17-billion-parameter model designed for local deployment.
- →large language models: Llama 4 Scout is Meta's 17-billion-parameter model designed for local deployment.
Meta CEO Mark Zuckerberg framed the release as part of a long-term bet. "We believe open-source AI is how you build a healthier ecosystem," he wrote in a statement accompanying the release. "Not because it's altruistic — because the best AI products will be built on foundations that everyone can see, audit, and improve." It's a familiar pitch, but it lands differently now than it did when Llama 1 launched in February 2023 to a much smaller audience. The open-source AI landscape has matured significantly since then, and Meta's willingness to release models at this scale has forced every other major lab to reckon with the cost and access arguments.
The practical implications are real. A developer building a legal document review tool, a healthcare company processing patient records, or a government agency with data sovereignty requirements — all of them now have access to a model competitive with the current frontier, deployable entirely within their own infrastructure. That is a non-trivial development. It doesn't eliminate the case for cloud-based frontier models from OpenAI, Anthropic, or Google, but it narrows it.
The one area where the reception has been more guarded is safety. Meta's approach to model safety in Llama 4 involves a layered system: a dedicated Llama Guard 4 classifier model for filtering inputs and outputs, plus an updated Prompt Guard system for detecting jailbreaks. Researchers who have already spent the weekend testing the public weights have found that, as with prior Llama releases, the base weights can be coaxed into producing content that the safety layers are designed to block. Meta acknowledged this in the release notes, describing it as "an inherent property of openly released weights" and noting that it has shared the Llama Guard 4 weights specifically so that deployers can run their own filtering layer.
That is a reasonable position. It is also a position that will keep the policy debate about open-source AI running at full volume for the foreseeable future.