The Missing Piece: Symbolic AI's Role in Solving Generative AI Hurdles

Hallucinations, factual errors, a decrease in public interest, and a plunge in investment -all of these and more have been mentioned recently in news and comments that look like the harbinger of a serious setback for Generative AI. Some have even called for the demise of the AI "bubble," something they "predicted a long time ago."

In this post, I discuss how the current hurdles of Generative AI systems could be (have been?) mitigated with the help of the good old symbolic reasoning.

But let’s first explain those current hurdles.

Take, for instance, something as basic as arithmetic. I asked yesterday to ChatGPT how much is the cubic root of 123,456.78, getting 49.3989 as the answer. Then I asked ChatGPT to multiply 49.3989 by itself two times, supposedly getting back to my 123,456.78 number, and –no kidding– ChatGPT came with 123,456.78 as a result, supposedly verifying the correctness of its operation. I could have believed it, but just to be sure, I made the operation with a calculator and multiplying 49.3989 by itself two times got 120,545.73, which is almost 3,000 units away from the intended result. That means, of course, that the first operation (the cubic root) was wrong in the first place!

Further, when I told ChatGPT about its mistake, it told me that it was "a rounding mistake," which would mean that four decimals were not enough. But when I asked my calculator for the cubic root of 123,456.78, it gave me 49.7934 instead of ChatGPT’s 49.3989. You can see that the difference is in the first decimal digit, far away from the "rounding error" explanation. Sorry, ChatGPT, an error is an error.

So, in short, the calculator you have in your phone is light years ahead of ChatGPT’s arithmetic capabilities. How is this possible when ChatGPT is one of the most expensive products ever developed?

One problem with Generative AI chatbots is that they have been trained to "sound correct," not to be correct. And this shows in ChatGPT, Gemini and other otherwise advanced chatbots. Some say that this is a consequence of RLHF (reinforcement learning from human feedback), which forces the AI to agree with its human teachers instead of looking for correct answers.

Ok, let’s try not to be negative. How can factual mistakes be corrected, at least in arithmetic? How can we –let’s imagine– couple a calculator with ChatGPT?

It turns out, this idea has been actually tried, with some success (read below).

The battle between symbolic AI and neural networks

I started working on AI in the mid-80s last century when I was studying for my PhD. At the time, there was something like a feud between the partisans of symbolic AI and those of neural nets, which we dismissed and called "subsymbolic." I was fascinated with the applications of formal logic and deduction for things like program construction (this ended up being the topic of my PhD Thesis).

Fast-forward to the present day, and deep neural nets are all the rage. OpenAI (which didn’t exist back then) has achieved more success with deep learning and large language models (LLM) for tasks such as translation than what was possible with symbolic methods.

The irony of this story is that even at symbolic tasks, such as generating programming code or making translations, neural nets were more capable than symbolic approaches. Yes, our despised "subsymbolic" methods ended up doing symbolic jobs better.

Or did they?

Given the many limitations of the otherwise wonderful current AI conversational AI systems (chatbots like ChatGPT), perhaps the last word has yet to be said.

Perhaps the old good symbolic methods have something of value to offer.

Hybrid systems: the best of both worlds?

As I mentioned above, the idea of coupling a calculator with AI chatbots has already been proposed—not only a calculator but an entire symbolic reasoning system, which happens to be the powerful Wolfram Alpha. In case you haven’t tried it, please open a browser tab and head to https://www.wolframalpha.com/

It is the brainchild of Stephen Wolfram, a British-American computer scientist, physicist, and mathematician. He founded Wolfram Research in 2009.

With Wolfram Alpha, you can:

Solve equations.
Differentiate, integrate.
Compute numerical operations (of course)
Generate plots and graphs for functions.
Analyze datasets.
Draw graphs for data.
Analyze financial data.
… and much more, including translations, text analysis, health information, geography, astronomy, you name it (I’m not related in any way to Wolfram Alpha).

OpenAI has coupled the Wolfram Alpha with their GPT-4 as an extension, called a "plugin." It is called from GPT-4 through an API, and its access is available only for paying customers of OpenAI products.

The approximate steps for calling the Wolfram plugin are

First, GPT-4 analyzes the user request to see if it needs help from Wolfram.
In the positive case, it formats the information to configure an API request to the Wolfram plugin.
Results from the Wolfram plugin are delivered to GPT-4 as structured data.
GPT-4 produces a natural language text with the response that is handed back to the user.

Of course, there are some delicate aspects of this process. For instance, how do we know when Wolfram is needed and when it’s not needed? But when things work well, you won’t have factual mistakes like the operations I presented at the beginning of this post (where I ran ChatGPT without the Wolfram plugin).

It’s worth noticing also that Wolfram Alpha (accessed through its web page) now takes natural language queries, like "What is the cubic root of 123,456.78?" It will give you the right answer (49.793385…) with no BS, unlike the regular ChatGPT.

The two possible ways to couple a symbolic reasoning system with an LLM

With the Wolfram plugin, we saw one possible way of coupling a symbolic reasoning system (like Wolfram Alpha) with an LLM like ChatGPT. It involves making the symbolic system a "slave" of ChatGPT, which means that ChatGPT is called first, Wolfram afterward.

The other way around is to call first the symbolic system and afterward to call the LLM. How could this be done?

Actually, this has already been done by several AI developers, and for sure by Google.

It’s called "RAG" (Retrieval Augmented Generation) and can be done in several ways, but the one I’m going to explain uses the "Knowledge Graph" (KG) using Google’s terminology.

The KG is (I don’t want to be technical here) a collection of interrelated facts that we know, such as that Washington is the capital of the United States.

A KG is stored as a collection of "triplets," composed of "nodes," "edges," and "labels." For instance, for the city of Washington, we can have the triplet (Washington, US, capitalOf), where Washington and US are nodes; there is an edge from Washington to the US with the label "capitalOf." That’s it.

A good KG stores millions of triplets, so not everybody can make one. Google has been building and polishing a KG since 2010, when it acquired Metaweb, a company that had been working on this for years. That means Google’s KG has been over 14 years in the making!

A KG is a "Knowledge Base" term that comes from symbolic AI. It refers to "knowledge" because the KG stores things that people know, like countries’ capital cities, presidents, kings, and so on –as opposed to obscure weights inside a neural network. Personally, I spent the first 10 years of my working career as a researcher on topics related to Knowledge representation and Knowledge bases.

Next, how is the KG used with RAG?

You know that LLMs have the tendency to make things up when they don’t find the real facts about a subject. So, what RAG does is give the LLM the relevant information, which could be taken from the KG.

The information from the KG is fed into the LLM, concatenating it with the user query.

The steps involved in using the KG with RAG for a user query are:

Using the user’s request, make a query to the KG.
The N most relevant triplets from the KG are fed to the LLM together with the user’s query.
The LLM gives its answer to the user, taking into account the additional information.

It has been found that this form of RAG helps a lot to "ground" the chatbot with real information and to make it give more truthful responses.

Notice that I wrote "more truthful responses" instead of "truthful responses" because, currently, LLMs don’t provide a warranty of correctness, with RAG and a KG or without. However, the improvement has been documented and objectively measured.

Closing thoughts

The two forms of improving LLMs’ trustfulness I mentioned above are not exclusive: you can put RAG before and calls to Wolfram or another knowledge-based symbolic reasoner after. I guess, though, that this could slow down the whole process and make thus less responsive the chatbot to the user.

There are other possible venues for combining symbolic reasoning and LLMs, but in this post, I wanted to highlight what has been implemented and proven.

Some researchers are experimenting with a "mixture of experts" approach, so some of the said experts could be symbolic reasoners, which would be taken into account mostly when the user’s question asks for precision and rigor. As I commented above, even arithmetic operations fill the bill.

I spent all my AI research career in the symbolic camp, and perhaps I dismissed neural nets a bit, but that was just my personal expertise (and ignorance) and says nothing about the relative merits of the two.

In the end, perhaps the ancient battle between the symbolic camp "against" the neural networks can be settled with an alliance between the two.

I’d be OK with it.

— Get my personally curated AI news analysis and tech explainers with my free newsletter, "The Skeptic AI Enthusiast," at https://rafebrena.substack.com