Meta released Llama 4 on Sunday, and the AI community has been going through it with the kind of forensic intensity usually reserved for leaked court documents. Two model variants dropped simultaneously: Llama 4 Scout, a 17-billion-parameter model optimized for local deployment, and Llama 4 Maverick, a 400-billion-parameter mixture-of-experts model designed for enterprise workloads. Both are free to download under Meta's open-use license, which permits commercial deployment for organizations with fewer than 700 million monthly active users — a threshold that exempts pretty much everyone except Meta itself and a handful of other tech giants.
The Scout headline is that it runs on a single Nvidia RTX 4090 or equivalent consumer GPU with 24 GB of VRAM. That has been a threshold point developers have been waiting for: a genuinely capable model that a solo developer or small team can run entirely on their own hardware, without paying cloud inference costs, without sending their data to anyone's server. In early benchmarks circulating on the developer forums, Scout is scoring on par with GPT-5.4 mini and Gemini 2.0 Flash on standard coding and reasoning tasks, and slightly above both on structured data extraction.
Continue reading to see the full article