A very quick introduction to the Reversal Curse haunting ChatGPT and Llama

Here is what 2 simple prompts about Tom Cruise and his mother can teach you about a curse haunting LLMs and AI.

4 min readSep 26, 2023

A good way to test a language models ability to generalize and language understanding is to reverse the order in statements. By common sense, we know that if “A is B” then “B is A”. Testing a model’s ability to catch this is necessary in evaluating its NLU capabilities. However, what might be obvious to us, is a huge problem for many LLMs. Even when an LLM is trained directly with information in the form of “A is B” it doesn’t improve it’s performance in solving “B is A”

Ask ChatGPT “Who is Tom Cruise’s mother” and it will answer. However, flip this question and ask ChatGPT, “Who is Mary Lee Pfeiffer’s son?” and it will not be able to answer. Even though the 2 questions are functionally identical in information, ChatGPT is unable to answer the second one.

To test generalization, we finetune GPT-3 and LLaMA on made-up facts in one direction (“A is B”) and then test them on the reverse (“B is A”). We find they get ~0% accuracy! This is the Reversal Curse.

To quote researchers for a more formal definition of the Reversal Curse- If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was the ninth Chancellor of Germany”, it will not automatically be able to answer the question, “Who was the ninth Chancellor of Germany?”. Moreover, the likelihood of the correct answer (“Olaf Scholz”) will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if “A is B” occurs, “B is A” is more likely to occur).

Moreover, this curse doesn’t go away as we scale up. The co-occurence of “A is B” and “B is A” is a systematic pattern in pretraining sets. Auto-regressive LLMs completely fail to meta-learn this pattern, with no change in their log-probabilities and no improvement in scaling from 350M to 175B parameters.

To learn more about the reversal curse impacting LLMs, I would suggest reading the paper- “The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” “. Paper Link- https://owainevans.github.io/reversal_curse.pdf

PS: Looks like Bard handles the reversal curse better than GPT. Ran a basic experiment here-https://twitter.com/Machine01776819/status/1706447329061118410

