Analysts and experts say that the openness and accuracy of the Llama 3.1 family of models pose an existential threat to providers of closed proprietary large language models. Credit: Noe Besso/Shutterstock Meta’s newly unveiled Llama 3.1 family of large language models (LLMs), which includes a 405 billion parameter model as well as 70 billion parameter and 8 billion parameter variants, is a boon for enterprises and a bane for proprietary LLM vendors, analysts and experts say. “While at one end the open weights of the updated models and Llama 3.1 will provide an option for enterprises to shun and reduce their usage of closed proprietary LLMs, at the other end these models will disrupt companies or vendors building and selling LLMs,” said Tobias Zwingmann, managing partner at AI prototyping service platform Rapyd.AI. How will Llama 3.1 help enterprises and developers? The advantages of the Llama 3.1 family of models for enterprises and developers reside in its open weights and its performance efficiency in benchmark tests compared to closed proprietary LLMs, such as OpenAI’s GPT-4o and Google’s Gemma 2, according to analysts. “Providing a language model with open weights empowers businesses to build custom AI solutions for their use cases without garnering hefty licensing fees from using proprietary models. While enterprises benefit from cost savings and increased flexibility, developers can accelerate innovation by leveraging a solid foundation,” said Paul Nashawaty, principal analyst at The Futurum Group. IDC research vice president Arnal Dayaratna said that enterprises could fine-tune an open model from the Llama 3.1 family using their own proprietary data without concerns that their data and intellectual property will be shared with another vendor. “This will also enable enterprises and developers to avoid vendor lock-in,” Dayaratna added. The release of the larger 405 billion parameter model, according to chief analyst Bradley Shimmin, is even more significant for enterprises as they can now have access to a free LLM that matches the performance efficiency or intelligence of models, such as GPT-4, Gemini, and Claude. Meta in a blog post said that the larger 405B Llama 3.1 model outperformed models such as Nemotron-4 340B Instruct, GPT-4, and Claude 3.5 Sonnet in benchmark tests such as MMLU, MATH, GSM8K, and ARC Challenge. Its performance was close to GPT-4o in these tests as well. For context, GPT-4o scored 88.7 in the MMLU benchmark and Llama 3.1 405B scored 88.6. MMLU, MATH, GSM8K, and ARC Challenge are benchmarks that test LLMs in the areas of general intelligence, mathematics, and reasoning. The smaller Llama 3.1 models of 8B and 70B, which have been updated with larger context windows and support for multiple languages, also performed better or close to proprietary LLMs in the same benchmark tests, Meta said in the blog post. Earlier in April, Meta released the previous versions of its Llama 3 8B and Llama 3 70B models, which boasted architectural improvements over Llama 2 and improved techniques, such as a standard decoder-only transformer architecture, grouped query attention (GQA), as well as a higher-quality training data set. Llama 3.1’s larger 405B variant, according to Anton McGonnell, product leader at generative AI platform provider SambaNova Systems, can offer better accuracy for general purpose tasks and this will allow enterprises to further accelerate improvements in both employee and customer use cases. “We expect to see developers use techniques like speculative decoding, where less complex models handle the bulk of processing, and then call upon the larger model to verify work and correct errors when needed,” McGonnell said, adding that this could be an efficient way to run AI models as it opens new avenues for optimizing computing resources and speeds up responses in real-time applications. Additionally, IDC’s Dayaratna pointed out that the Llama 3.1 405B model can perform synthetic data generation as well as model distillation, meaning the transfer of knowledge from a larger model to a smaller one. These capabilities enable enterprises to run additional analytic workstreams, Dayaratna added. Is Llama 3.1 too expensive to deploy for enterprises? While Llama 3.1 is relatively more intelligent than its predecessors, deploying the model may be too expensive for small and medium enterprises, analysts believe. The Llama 3.1 405B variant is extremely expensive to run due to requiring two Nvidia H100 servers (16 GPUs) to run the unmodified model, Dylan Patel, chief analyst at semiconductor research firm SemiAnalysis said. Patel noted that this is much more expensive than past models, which fit on a single lower-end GPU (Llama 8B) or two high-end GPUs (Llama 70B). “Renting two H100 servers for a year will cost approximately upwards of $300,000 a year, so deploying Llama 3.1 405B on premises is too expensive for small firms,” Patel said. The reason behind the increase in infrastructure costs is the increase in model parameters, which provides more accurate results, SambaNova Systems’ McGonnell said. Typically, any LLM provider releases multiple variants of models to allow enterprises to choose between latency and accuracy depending on use cases. While a model with more parameters can be more accurate, the one with fewer parameters requires less computation, takes less time to respond, and therefore costs less to run. However, Patel and McGonnell both pointed out that most large-scale enterprises are likely to take advantage of the Llama 3.1 405B model, whether it is for fine-tuning and training other models or in production use cases like chatbots. “Large enterprises might not view the cost of running the 405B model as that expensive given the level of intelligence and usefulness the model brings,” Patel said. In addition, the analysts said that there is another way to reduce the cost of running the larger model. Several large cloud service providers, along with other model serving vendors, are already trying to bring the new model to their customers. “Most firms will rely on cloud APIs to utilize Llama 3.1 405B. Every major cloud provider offers the model,” Patel said. Using APIs allows enterprises to access the necessary compute resources on a pay-as-you-go basis, reducing upfront investments, Patel added. Meta said it had partnered with the likes of Accenture, AWS, AMD, Anyscale, Cloudflare, Databricks, Dell, Deloitte, Fireworks.ai, Google Cloud, Groq, Hugging Face, IBM watsonx, Infosys, Intel, Kaggle, Microsoft Azure, Nvidia DGX Cloud, OctoAI, Oracle Cloud, PwC, Replicate, Sarvam AI, Scale.AI, SNCF, Snowflake, Together AI, and the UC Berkeley vLLM Project to make the Llama 3.1 family of models available and simpler to use. While cloud service providers such as AWS and Oracle will provide the latest models, partners such as Groq, Dell, and Nvidia will allow developers to use synthetic data generation and advanced retrieval augmented generation (RAG) techniques, Meta said, adding that Groq has optimized low-latency inference for cloud deployments, and that Dell has achieved similar optimizations for on-prem systems. Other large models, such as Claude, Gemini, and GPT-4o, are also served via APIs. Additionally, McGonnell pointed out that the availability of Llama 3.1 will spark a race among AI cloud service providers and model serving vendors to offer the most efficient and cost-effective API solutions for deploying Llama 3.1 405B. TogetherAI and Fireworks.ai, both of which are Meta’s partners in proliferating its latest model, according to Patel are bringing the most innovative inference optimizations to reduce costs significantly. Will Llama 3.1 spell doom for rival LLM providers? The release of a somewhat open LLM that can perform better or as good as closed proprietary LLMs poses a significant challenge to rival LLM providers, big or small, experts and analysts believe. “Companies like Cohere, Aleph Alpha, and similar startups developing proprietary LLMs likely won’t exist in the next one or two years or they’ll just survive in a much smaller niche and more expensive form. It’s like betting on Solaris when the world gravitated toward Windows, Mac, and Linux,” Zwingmann of Rapyd.AI said. In addition, McGonnell pointed out that as LLMs start to become more commoditized due to their open natures, proprietary providers like OpenAI will either need to compete on reducing their costs or improving their performance. “We have been seeing OpenAI starting to release cheaper versions of GPT-4, suggesting that they are focused on reducing costs,” McGonnell at SambaNova Systems said. Further, within 24 hours of Meta releasing the Llama 3.1 update, OpenAI also took to Twitter, now rebranded as X, to alert its customers to releasing a free tier for customizing its GPT-4o mini model. Rapyd.AI’s Zwingmann said that this battle between open and proprietary LLMs will benefit enterprises. “Expect token costs for LLMs to come down even further. There’s no longer a big moat that allows any vendor to charge significantly more than the market average,” the managing partner explained. Tokens are the units used to measure the amount of text processed by an LLM API when it breaks down a query from a user. The Llama family of models, according to Bradley Shimmin, industry analyst at Omdia, already dominates its direct and larger LLMs, especially proprietary ones from Google, Anthropic, and OpenAI. Omdia’s research is based on investigating and collating job posts seeking skills for working on Meta’s family of LLMs. Open weights vs. open source Although Meta and its CEO Mark Zuckerberg describe the latest family of Llama models as open source, several analysts begged to differ. Omdia’s Shimmin said that Meta’s models are not truly open source as defined by the Open Source Initiative. “All Llama models are not actually open source as we would see with software licensed under MIT or Apache agreements. I would rather say that it is an open and permissive community license that gives AI practitioners everything they need to build AI outcomes for commercial use,” Shimmin explained. Shimmin said that although Meta provides the model weights for all of its LLMs, the company doesn’t provide full transparency into the data used for pre-training the LLMs. The larger part of the problem, according to experts, is that there is currently no definition of what an open source LLM is or should be. Related content news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages news Visual Studio 17.12 brings C++, Copilot enhancements Debugging and productivity improvements also feature in the latest release of Microsoft’s signature IDE, built for .NET 9. By Paul Krill Nov 13, 2024 3 mins Visual Studio Integrated Development Environments Microsoft .NET news Microsoft’s .NET 9 arrives, with performance, cloud, and AI boosts Cloud-native apps, AI-enabled apps, ASP.NET Core, Aspire, Blazor, MAUI, C#, and F# all get boosts with the latest major rev of the .NET platform. By Paul Krill Nov 12, 2024 4 mins C# Generative AI Microsoft .NET news Red Hat OpenShift AI unveils model registry, data drift detection Cloud-based AI and machine learning platform also adds support for Nvidia NIM, AMD GPUs, the vLLM runtime for KServe, KServe Modelcars, and LoRA fine-tuning. By Paul Krill Nov 12, 2024 3 mins Generative AI PaaS Artificial Intelligence Resources Videos