OpenAI might have solved one of the biggest problems with LLMs

This development has the potential to 10x the use of LLMs

3 min readOct 23, 2023

GPT-4 might have solved one of the biggest problems haunting LLMs- their tendency to forget ground truths. You will have a much harder time gaslighting LLMs now.

One of the biggest weaknesses that LLMs is that they can be fooled very easily. Around June, I asked GPT-4 to play a game of chess with me. I then asserted my dominance on it with a 2-move checkmate, by simply declaring checkmate after playing a random opening move. Stunned by my genius, GPT-4 had no choice but surrender.

I was far from the only one. Many people noted that it was remarkably simple to ‘trick’ the model to believing something obviously untrue with some basic prompting. You could also induce hallucinations by simply giving it certain inputs. All of this hinted that GPT had a weak relationship on ground truth.

Looks like the most recent update of GPT-4 might have fixed this exploit. I’ve tested various versions and looks like the current GPT-4 model does a much better job keeping track of what is right and wrong. It still has issues with reliability and being specific, but this is a huge step up from what I’ve seen so far.

Of course, I’ll have to look deeper before making any conclusions, but this is promising. My guess is that they used some kind of hierarchical embeddings to simulate ground truth. What the model knows to be true is embedded in a separate layer. If a prompt conflicts with the ground truth representations, it’s ignored. Theoretically this should provide better protection against jailbreaks and other exploits.

That is just my speculation. If you have insights into this, I’d love to hear how you think this could be accomplished.

PS: This is part of my upcoming piece on whether LLMs understand language. To catch it, sign up here- https://artificialintelligencemadesimple.substack.com/

Artificial Intelligence Made Simple | Devansh | Substack

Turning complex ideas in AI Research, Machine Learning, Deep Learning, and Data Science into actionable insights. Read…

artificialintelligencemadesimple.substack.com

If you liked this article and wish to share it, please refer to the following guidelines.

If you find my writing useful and would like to support my writing- please consider becoming a premium member of my cult by subscribing below. Subscribing gives you access to a lot more content and enables me to continue writing. This will cost you 400 INR (5 USD) monthly or 4000 INR (50 USD) per year and comes with a 60-day, complete refund policy. Understand the newest developments and develop your understanding of the most important ideas, all for the price of a cup of coffee.

Support AI Made Simple