by Subutai Ahmad

How to get LLM-driven applications into production

feature
Sep 26, 20247 mins
Artificial IntelligenceGenerative AISoftware Development

Five of the most common and complex challenges organizations face in putting large language models into production and how to tackle them.

Many organizations are building generative AI applications driven by large language models (LLMs), but few are transitioning successfully from prototypes to production. According to an October 2023 Gartner survey, 45% of organizations are currently piloting generative AI, while only 10 % have fully deployed it. The lack of AI success is similar for enterprises, product companies, and even some startups focused on LLM-based applications. Some estimates place the failure rate as high as 80 %.

So, why the chasm between the tantalizing potential of LLMs and the reality? Here are five of the most common and complex challenges organizations face in putting LLMs into production and how to tackle them.

Privacy, security, and compliance concerns

Deploying LLMs in enterprise settings presents significant challenges concerning data security, privacy, and compliance. Enterprises often hesitate to use LLMs for production software applications due to concerns about potential leaks of sensitive data during training. This risk is compounded by the stringent requirements to maintain compliance and ensure responsible customer data handling. For companies, the stakes are high, as mishandling data can lead to severe financial penalties and loss of trust. Consequently, finding solutions that safeguard data while leveraging the capabilities of LLMs is a critical barrier that must be overcome to realize the potential of AI in enterprise environments.

To address these concerns, enterprises should conduct thorough due diligence on the architecture and infrastructure of their AI training and inference systems. By carefully examining the workflows and dataflows, businesses can better understand the risks of undesirable access to their data and AI models throughout the product life cycle. This proactive approach ensures that all potential vulnerabilities are identified and mitigated, securing the data while harnessing the power of advanced AI technologies.

AI hallucinations

Another significant hurdle in deploying LLMs within enterprises is the risk of AI hallucination, particularly when compounded by pre-existing issues with data quality. Media coverage of AI hallucination and its potential negative impacts has heightened awareness and concern among business decision-makers. This often serves as a substantial impediment to green-lighting LLM-driven applications for production, especially in the absence of clear solutions for detecting and mitigating hallucination.

A practical approach to overcoming this challenge is selecting the most appropriate tool for the problem at hand. Although GPT models have received considerable attention and are often the go-to solution, they may not always be the best fit for every enterprise application. Alternatives like BERT models, which excel at understanding and analyzing documents with high accuracy, should be considered. These models are cheaper, faster, and less prone to generating erroneous outputs than GPT.

Furthermore, employing retrieval-augmented generation (RAG) techniques, which combine the robust embedding capabilities of BERT with the generative prowess of GPT, can significantly enhance the quality of the end product. This strategy ensures that enterprises can appropriately leverage the strengths of various AI technologies, mitigating risks while optimizing performance.

LLM quality assessment

Deploying LLMs like GPT in enterprise environments presents unique issues in terms of quality assessment due to the subjective nature of the outputs. Unlike traditional classification tasks where results are delineated as right or wrong, the outputs from GPT models are often open to interpretation, complicating the determination of quality. This ambiguity poses significant difficulties for integrating LLMs into the conventional CI/CD processes used in software application deployment, as these typically rely on deterministic tests to ensure a product meets quality standards.

To effectively address this issue, enterprises must rethink their release processes to accommodate the nuances of LLMs. Incorporating new techniques and approaches into the LLM application development process is essential. One strategy is to employ various methods to evaluate the quality of LLM outputs. For instance, utilizing an established LLM like GPT-4 as a benchmarking tool can provide a comparative measure to validate the outputs of other models. Adopting more agile deployment methodologies, such as A/B testing and canary releases, is highly recommended. These techniques allow for incremental deployment that can help mitigate risks by identifying potential issues early in the release process, thereby ensuring that quality standards are maintained despite the inherent challenges posed by LLMs.

Operationalization challenges

Deploying LLMs in enterprise settings involves complex AI and data management considerations and the operationalization of intricate infrastructures, especially those that use GPUs. Efficiently provisioning GPU resources and monitoring their usage present ongoing challenges for enterprise devops teams. This complex landscape requires constant vigilance and adaptation as the technologies and best practices evolve rapidly.

To stay ahead, it is crucial for devops teams within enterprise software companies to continuously evaluate the latest developments in managing GPU resources. While this field is far from mature, acknowledging the associated risks and constructing a well-informed deployment strategy is essential. Furthermore, enterprises should also consider alternatives to GPU-only solutions. Exploring other computational resources or hybrid architectures can simplify the operational aspects of production environments and mitigate potential bottlenecks caused by limited GPU availability. This strategic diversification ensures smoother deployment and more robust performance of LLMs across different enterprise applications.

Cost efficiency

Successfully deploying AI-driven applications, such as those using large language models in production, ultimately hinges on the return on investment. As a technology advocate, it is imperative to demonstrate how LLMs can positively affect both the top line and bottom line of your business. One critical factor that often goes underappreciated in this calculation is the total cost of ownership, which encompasses various elements, including the costs of model training, application development, computational expenses during training and inference phases, ongoing management costs, and the expertise required to manage the AI application life cycle.

For many technology leaders, the challenge lies in fully understanding and controlling these components. Without a comprehensive grasp of these factors, justifying the transition of AI initiatives from experimental to operational stages becomes problematic. Therefore, it is highly recommended that decision-makers maintain flexibility in their strategic planning and stay open to more cost-efficient AI solutions. This involves exploring innovations that can reduce computational costs and pursuing systems that are easier to manage in a production environment. By prioritizing these aspects, enterprises can optimize their financial outlay and enhance the scalability and sustainability of their AI deployments.

A final word

Putting LLMs into production is a challenge with many unknowns. From safeguarding sensitive data to cost considerations and not being able to reproduce results, it can get complicated. Companies must carefully examine workflows and dataflows, select the right type of model for the job, be prepared to tweak software release processes, consider alternatives to GPUs, and conduct a thorough TCO analysis.

But remember that everything is solvable. With some upfront homework and a willingness to learn and adjust technologies and processes, your organization can be among the few that successfully deploy LLM-based applications and begin reaping the benefits.

Subutai Ahmad is CEO at Numenta, where he has been instrumental in driving Numenta’s research, technology, and business since 2005.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.