by Gokcen Tapkan

Is creating an in-house LLM right for your organization?

feature
Feb 26, 20247 mins
Artificial IntelligenceGenerative AIMachine Learning

Five key questions you should ask before embarking on the journey to create your own in-house large language model.

shutterstock 2270669753 teamwork concept team putting colored gears together
Credit: alphaspirit.it / Shutterstock

Business leaders have been under pressure to find the best way to incorporate generative AI into their strategies to yield the best results for their organization and stakeholders. According to Gartner, 38% of business leaders noted that customer experience and retention are the primary purpose of their genAI investments, making it essential to the future of their businesses. However, as enticing as it may seem, it is important to consider whether LLMs (large language models) are right for your business before developing your AI strategy.

While generally available and easy to access immediately, there are challenges in using off-the-shelf LLMs effectively. These include a too generalized customer experience lacking industry context, an increased cost of outsourcing embedding models, and privacy concerns due to sharing data externally. Training an in-house AI model can directly address these concerns, while also inspiring creativity and innovation within the team to utilize the model for other projects. Once you decide that you need a domain-specific AI, here are five key questions you should ask before embarking on the journey to create your own in-house model.

Question 1: What is the business problem and how can AI solve it?

Before delving into the world of foundational models and LLMs, take a step back and note the problem you are looking to solve. Once you identify this, it’s important to determine which natural language tasks you need. Examples of these tasks include summarization, named entity recognition, semantic textual similarity, and question answering, among others. 

A downstream task and domain awareness are apples and oranges, and it’s important to know the difference. Despite their popularity, LLM models like GPT, Llama, and PaLM are only appropriate for downstream tasks (such as question answering and summarization) with few-shot prompting or additional fine-tuning. Although foundational models can function well in a wider context, they lack the industry or business-specific domain expertise necessary to be useful in most applications. Achieving great results in downstream tasks does not mean it will also have domain awareness for your specific industry.

Question 2: Are there industry-specific AI tools already available?

As part of the research phase of your AI strategy, it’s important to evaluate existing tools closely, because some of them could be industry-specific but still miss specific nuances for your business. When auditing available tools, focus on ensuring that the AI model can understand context, as well as words in the language of your choice to best grasp prompts and generate responses relevant to your user.

In our case, after doing research and tests, we discovered there wasn’t a strong cybersecurity LLM for third-party risk specifically. So our team selected a BERT-based model for fine-tuning in cybersecurity two years ago.

Additionally, while constructing our AI model, we noticed that the outcomes consistently fell within a specific range as we analyzed various texts within the cybersecurity domain. The base model we employed perceived the text as homogeneous, attributing the similarity to its origin within the same domain. We worked hard to provide it with context and nuances of the cybersecurity industry, which helped solve our problem of lack of domain awareness. 

Context is also essential because even today, genAI can hallucinate on specific matters and should not be 100% trusted as is. This is one of the many reasons why the Biden-Harris Administration released an executive order on safe, secure, and trustworthy AI. Before using an AI tool as a service, government agencies need to make sure the service they are using is safe and trustworthy, which isn’t usually obvious and not captured by just looking at an example set of output. And while the executive order doesn’t apply to private sector businesses, these organizations should take this into consideration if they should adopt similar policies. 

Although the training and fine-tuning process involved with an in-house model will include thorough testing, weakness identification, and model analysis and be quite lengthy, it will be worth it in the long run. 

Question 3: Is your data ready? 

Your organization’s data is the most important asset to evaluate before training your own LLM. Those companies that have accumulated high-quality data over time are the luckiest in today’s LLM age, as data is needed at almost every step of the process including training, testing, re-training, and beta tests. High-quality data is the key to success when training an LLM, so it is important to consider what that truly means. The answer certainly changes depending on the task and domain, but a general rule is that the data that needs minimum curation and less re-training.

Once companies begin the journey to train an LLM, they typically discover that their data isn’t ready in several ways. The data could turn out to be too noisy, or ineffectively labeled due to poor expert selection or limited time allocated to experts. Or the data could include hidden repetitions that provide minimum or no value to the training process, and not represent the domain or task entirely, which may cause the resulting AI model to overfit. 

It’s important to anticipate that data could easily become the bottleneck of your project, as it takes the most time to organize. It could even take years before data is truly AI-ready.

Question 4: Do you have sufficient experts available to train AI models? 

Experts play an important role in the generation of data and determining the quality of data. Why? Because we still need humans to generate reliable data that will be used in the training process. Synthetically generated data sets so exist, but these are not useful unless they are evaluated and qualified by human experts. 

When selecting your expert, select someone with deep industry knowledge to fine-tune your model (either an in-house expert or an outsourced expert). More specifically, you will need experts to label data, give feedback about data, test data, and retrain based on feedback. This is an important part of the process to get accurate, reliable results with your trained AI model. 

Question 5: What are your time constraints? 

Training an in-house AI model is a costly and lengthy process. The business problem, quality of readily available data, and number of experts and AI engineers involved all impact the length and quality of the project. Because the process relies on trial and error, it’s an inherently longer time before the solution is ready for use.

Besides the issues that could stem from the data, there are other challenges that might arise when setting the hyperparameters of the training algorithm, such as the learning rate, the number of epochs, and the number of layers. This is the point where AI experts might need to re-engineer to address overfitting and catastrophic forgetting issues that will be apparent in the test phases, which can cost the project extra time.

Although a carefully thought out process will reduce the stress, there is always the risk of a new LLM solution emerging and rendering your solution outdated. This boils down to the specificity and niches in your domain. Seek a balance between timing and quality, given the rapid pace of development of AI technology.

As is the case with many innovative solutions, there is not a one-size-fits-all approach. Weighing your options regarding the model that is right for your business is the first step when starting your company’s AI journey. For business leaders, training an LLM from scratch could sound daunting, but if you have data available and a domain-specific “business problem” that a generic LLM will not solve, it will be worth the investment in the long run.

Gokcen Tapkan is director of data research at Black Kite.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.