Hallucinations are the bane of AI-driven insights. Here’s what search can teach us about trustworthy responses, according to Snowflake’s CEO

This post was originally published on this site

https://fortune.com/img-assets/wp-content/uploads/2024/04/Sridar_Ramasawamy_headshot-e1712834891728.png?w=2048

Businesses are eager to capitalize on the power of generative AI, but they are wrestling with the question of trust: How do you build a generative AI application that provides accurate responses and doesn’t hallucinate? This issue has vexed the industry for the past year, but it turns out that we can learn a lot from an existing technology: search.

By looking at what search engines do well (and what they don’t), we can learn to build more trustworthy generative AI applications. This is important because generative AI can bring immense improvements in efficiency, productivity, and customer service–but only when enterprises can be sure their generative AI apps provide reliable and accurate information.

In some contexts, the level of accuracy required from AI is lower. If you’re building a program that decides which ad to display next on a web page, an AI program that’s mostly accurate is still valuable. But if a customer asks your AI chatbot how much their invoice is this month or an employee asks how many PTO days they have left, there is no margin for error.

Search engines have long sought to provide accurate answers from vast troves of data, and they are successful in some areas and weaker in others. By taking the best aspects of search and combining them with new approaches that are better suited for generative AI in business, we can solve the trust problem and unlock the power of generative AI for the workplace

Sorting the wheat from the chaff

One area where search engines perform well is sifting through large volumes of information and identifying the highest-quality sources. For example, by looking at the number and quality of links to a web page, search engines return the web pages that are most likely to be trustworthy. Search engines also favor domains that are known to be trustworthy, such as federal government websites, or established news sources such as the BBC.

In business, generative AI apps can emulate these ranking techniques to return reliable results. They should favor the sources of company data that have been most frequently accessed, searched, or shared. And they should strongly favor sources that are known to be trustworthy, such as corporate training manuals or a human resources database, while disfavoring less reliable sources.

LLMs are an interlocutor, not an oracle

Many foundational large language models (LLMs) have been trained on the wider Internet, which as we all know contains both reliable and unreliable information. This means that they’re able to address questions on a wide variety of topics, but they have yet to develop the more mature, sophisticated ranking methods that search engines use to refine their results. That’s one reason why many reputable LLMs can hallucinate and provide incorrect answers.

One of the learnings here is that developers should think of LLMs as a language interlocutor, rather than a source of truth. In other words, LLMs are strong at understanding language and formulating responses, but they should not be used as a canonical source of knowledge. To address this problem, many businesses train their LLMs on their own corporate data and on vetted third-party data sets, minimizing the presence of bad data. By adopting the ranking techniques of search engines and favoring high-quality data sources, AI-powered applications for businesses become far more reliable.

The humility to say ‘I don’t know’

Search has also gotten quite good at understanding context to resolve ambiguous queries. For example, a search term like “swift” can have multiple meanings–the author, the programming language, the banking system, the pop sensation, and so on. Search engines look at factors like geographic location and other terms in the search query to determine the user’s intent and provide the most relevant answer.

However, when a search engine can’t provide the right answer, because it lacks sufficient context or a page with the answer doesn’t exist–it will try to do so anyway. For example, if you ask a search engine, “What will the economy be like 100 years from now,” or “How will the Kansas City Chiefs perform next season,” there may be no reliable answer available. But search engines are based on a philosophy that they should provide an answer in almost all cases, even if they lack a high degree of confidence.

This is unacceptable for many business use cases, and so generative AI applications need a layer between the search (or prompt) interface and the LLM that studies the possible contexts and determines if it can provide an accurate answer or not. If this layer finds that it cannot provide the answer with a high degree of confidence, it needs to disclose this to the user. This greatly reduces the likelihood of a wrong answer, helps to build trust with the user, and can provide them with an option to provide additional context so that the gen AI app can produce a confident result.

This layer between the user interface and the LLM can also employ a technique called Retrieval Augmented Generation, or RAG, to consult an external source of trusted data that exists outside of the LLM.

Show your work

Explainability is another weak area for search engines, but one that generative AI apps must employ to build greater trust. Just as high school teachers tell their students to show their work and cite sources, generative AI applications must do the same. By disclosing the sources of information, users can see where information came from and why they should trust it. Some of the public LLMs have started to provide this transparency and it should be a foundational element of generative AI-powered tools used in business.

Going in with our eyes open

Despite every effort, it will be challenging to build AI applications that make very few mistakes. And yet the benefits are too significant to sit on the sidelines and hope that competitors don’t surge ahead. That puts an onus on business users to approach AI tools with their eyes open. Just as the internet has changed how people relate to news and news sources, business users must develop an educated skepticism and learn to look for signs of trustworthy AI. That means demanding transparency from the AI applications we use, seeking explainability, and being conscious of potential biases.

We’re on an exciting journey to a new class of applications that will transform our work and careers in ways that we can’t yet anticipate. But to be valuable in business, these applications must be reliable and trustworthy. Search engines laid some of the groundwork for surfacing accurate responses from large volumes of data, but they are designed with different use cases in mind. By taking the best of search and adding new techniques to ensure greater accuracy, we can unlock the full potential of generative AI in business.

Sridhar Ramaswamy is the CEO of Snowflake.

More must-read commentary published by Fortune:

  • Glassdoor CEO: ‘Anonymous posts will always stay anonymous’
  • We analyzed 46 years of consumer sentiment data–and found that today’s ‘vibecession’ is just men starting to feel as bad about the economy as women historically have
  • 90% of homebuyers have historically opted to work with a real estate agent or broker. Here’s why that’s unlikely to change, according to the National Association of Realtors
  • Intel CEO: ‘Our goal is to have at least 50% of the world’s advanced semiconductors produced in the U.S. and Europe by the end of the decade’

The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.

Subscribe to the new Fortune CEO Weekly Europe newsletter to get corner office insights on the biggest business stories in Europe. Sign up for free.

Add Comment