Arctic will be available under the Apache 2.0 license and can be accessed via Snowflake Cortex for serverless inference or across providers such as AWS, Azure, Nvidia, Perplexity, and Together AI. Credit: 3rdtimeluckystudio / Shutterstock Cloud-based data warehouse company Snowflake has developed an open-source large language model (LLM), Arctic, to take on the likes of Meta’s Llama 3, Mistral’s family of models, xAI’s Grok-1, and Databricks’ DBRX. Arctic is aimed at enterprise tasks such as SQL generation, code generation, and instruction following, Snowflake said Wednesday. It can be accessed via Snowflake’s managed machine learning and AI service, Cortex, for serverless inference via its Data Cloud offering and across model providers such as Hugging Face, Lamini, AWS, Azure, Nvidia, Perplexity, and Together AI, among others, the company said. Enterprise users can download it from Hugging Face and get inference and fine-tuning recipes from Snowflake’s Github repository, the company said. Snowflake Arctic versus other LLMs Fundamentally, Snowflake’s Arctic is very similar to most other open-source LLMs, which also use the mixture of experts (MoE) architecture and this includes DBRX. Grok-1, and Mixtral among others. The MoE architecture builds an AI model from smaller models trained on different datasets, and later these smaller models are combined into one model that excels in solving different kind of problems. Arctic is a combination of 128 smaller models. One exception among the open-source models on the market is Meta’s Llama 3, which has a transformer model architecture—an evolution of the encoder-decoder architecture developed by Google in 2017 for translation purposes. The difference between the two architectures, according to Scott Rozen-Levy, director of technology practice at digital services firm West Monroe, is that an MoE model allows for more efficient training by being more compute efficient. “The jury is still out on the right way to compare complexity and its implications on quality of LLMs, whether MoE models or fully dense models,” Rozen-Levy said. Snowflake claims that its Arctic model outperforms most open-source models and a few closed-source ones with fewer parameters and also uses less compute power to train. “Arctic activates roughly 50% less parameters than DBRX, and 75% less than Llama 3 70B during inference or training,” the company said, adding that it uses only two of its mix of expert models at a time, or about 17 billion out of its 480 billion parameters. DBRX and Grok-1, which have 132 billion parameters and 314 billion parameters respectively, also activate fewer parameters on any given input. While Grok-1 uses two of its eight MoE models on any given input, DBRX activates just 36 billion of its 132 billion parameters. However, semiconductor research firm Semianalysis’ chief analyst Dylan Patel said that Llama 3 is still significantly better than Arctic by at least one measure. “Cost wise, the 475-billion-parameter Arctic model is better on FLOPS, but not on memory,” Patel said, referring to the computing capacity and memory required by Arctic. Additionally, Patel said, Arctic is really well suited for offline inferencing rather than online inferencing. Offline inferencing, otherwise known as batch inferencing, is a process where predictions are run, stored and later presented on request. In contrast, online inferencing, otherwise known as dynamic inferencing, is generating predictions in real time. Benchmarking the benchmarks Arctic outperforms open-source models such as DBRX and Mixtral-8x7B in coding and SQL generation benchmarks such as HumanEval+, MBPP+ and Spider, according to Snowflake, but it fails to outperform many models, including Llama 3-70B, in general language understanding (MMLU), MATH, and other benchmarks. Experts claim that this is where the extra parameters in other models such as Llama 3 are likely to add benefit. “The fact that Llama 3-70B does so much better than Arctic on GSM8K and MMLU benchmarks is a good indicator of where Llama 3 used all those extra neurons, and where this version of Arctic might fail,” said Mike Finley, CTO of Answer Rocket, an analytics software provider. “To understand how well Arctic really works, an enterprise should put one of their own model loads through the paces rather than relying on academic tests,” Finley said, adding that it worth testing whether Arctic will perform well on specific schemas and SQL dialects for a specific enterprise although it performs well on the Spider benchmark. Enterprise users, according to Omdia chief analyst Bradley Shimmin, shouldn’t focus too much on the benchmarks to compare models. “The only relatively objective score we have at the moment is LMSYS Arena Leaderboard, which gathers data from actual user interactions. The only true measure remains the empirical evaluation of a model in situ within the context of its perspective use case,” Shimmin said. Why is Snowflake offering Arctic under the Apache 2.0 license? Snowflake is offering Arctic and its other text embedding models along with code templates and model weights under the Apache 2.0 license, which allows commercial usage without any licensing costs. In contrast, Llama’s family of models from Meta has a more restrictive license for commercial use. The strategy to go completely open source might be beneficial for Snowflake across many fronts, analysts said. “With this approach, Snowflake gets to keep the logic that is truly proprietary while still allowing other people to tweak and improve on the model outputs. In AI, the model is an output, not source code,” said Hyoun Park, chief analyst at Amalgam Insights. “The true proprietary methods and data for AI are the training processes for the model, the training data used, and any proprietary methods for optimizing hardware and resources for the training process,” Park said. The other upside that Snowflake might see is more developer interest, according to Paul Nashawaty, practice lead of modernization and application development at The Futurum Research. “Open-sourcing components of its model can attract contributions from external developers, leading to enhancements, bug fixes, and new features that benefit Snowflake and its users,” the analyst explained, adding that being open source might add more market share via “sheer good will”. West Monroe’s Rozen-Levy also agreed with Nashawaty but pointed out that being pro open source doesn’t necessarily mean that Snowflake will release everything it builds under the same license. “Perhaps Snowflake has more powerful models that they are not planning on releasing in open source. Releasing LLMs in a fully open-source fashion is perhaps a moral and/or PR play against the full concentration of AI by one institution,” the analyst explained. Snowflake’s other models Earlier this month, the company released a family of five models on text embeddings with different parameter sizes, claiming that these performed better than other embeddings models. LLM providers are increasingly releasing multiple variants of models to allow enterprises to choose between latency and accuracy, depending on use cases. While a model with more parameters can be relatively more accurate, the one with fewer parameters requires less computation, takes less time to respond, and therefore, costs less. “The models give enterprises a new edge when combining proprietary datasets with LLMs as part of a retrieval augmented generation (RAG) or semantic search service,” the company wrote in a blog post, adding that these models were a result of the technical expertise and knowledge it gained from the Neeva acquisition last May. The five embeddings models, too, are open source and are available on Hugging Face for immediate use and their access via Cortex is currently in preview. Related content analysis Strategies to navigate the pitfalls of cloud costs Cloud providers waste a lot of their customers’ cloud dollars, but enterprises can take action. By David Linthicum Nov 15, 2024 6 mins Cloud Architecture Cloud Management Cloud Computing analysis Understanding Hyperlight, Microsoft’s minimal VM manager Microsoft is making its Rust-based, functions-focused VM tool available on Azure at last, ready to help event-driven applications at scale. By Simon Bisson Nov 14, 2024 8 mins Microsoft Azure Rust Serverless Computing how-to Docker tutorial: Get started with Docker volumes Learn the ins, outs, and limits of Docker's native technology for integrating containers with local file systems. By Serdar Yegulalp Nov 13, 2024 8 mins Devops Cloud Computing Software Development news Red Hat OpenShift AI unveils model registry, data drift detection Cloud-based AI and machine learning platform also adds support for Nvidia NIM, AMD GPUs, the vLLM runtime for KServe, KServe Modelcars, and LoRA fine-tuning. By Paul Krill Nov 12, 2024 3 mins Generative AI PaaS Artificial Intelligence Resources Videos