OpenAI might have solved one of the biggest problems with LLMs

This development has the potential to 10x the use of LLMs

Devansh
3 min readOct 23, 2023

GPT-4 might have solved one of the biggest problems haunting LLMs- their tendency to forget ground truths. You will have a much harder time gaslighting LLMs now.

One of the biggest weaknesses that LLMs is that they can be fooled very easily. Around June, I asked GPT-4 to play a game of chess with me. I then asserted my dominance on it with a 2-move checkmate, by simply declaring checkmate after playing a random opening move. Stunned by my genius, GPT-4 had no choice but surrender.

I was far from the only one. Many people noted that it was remarkably simple to ‘trick’ the model to believing something obviously untrue with some basic prompting. You could also induce hallucinations by simply giving it certain inputs. All of this hinted that GPT had a weak relationship on ground truth.

Looks like the most recent update of GPT-4 might have fixed this exploit. I’ve tested various versions and looks like the current GPT-4 model does a much better job keeping track of what is right and wrong. It still has issues with reliability and being specific, but this is a huge step up from what I’ve seen so far.

Of course, I’ll have to look deeper before making any conclusions, but this is promising. My guess is that they used some kind of hierarchical embeddings to simulate ground truth. What the model knows to be true is embedded in a separate layer. If a prompt conflicts with the ground truth representations, it’s ignored. Theoretically this should provide better protection against jailbreaks and other exploits.

That is just my speculation. If you have insights into this, I’d love to hear how you think this could be accomplished.

PS: This is part of my upcoming piece on whether LLMs understand language. To catch it, sign up here- https://artificialintelligencemadesimple.substack.com/

If you liked this article and wish to share it, please refer to the following guidelines.

If you find my writing useful and would like to support my writing- please consider becoming a premium member of my cult by subscribing below. Subscribing gives you access to a lot more content and enables me to continue writing. This will cost you 400 INR (5 USD) monthly or 4000 INR (50 USD) per year and comes with a 60-day, complete refund policy. Understand the newest developments and develop your understanding of the most important ideas, all for the price of a cup of coffee.

Support AI Made Simple

Reach out to me

Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

Small Snippets about Tech, AI and Machine Learning over here

AI Newsletter- https://artificialintelligencemadesimple.substack.com/

My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

--

--

Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction