More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.
We need to tell people ChatGPT will lie to them, not debate linguistics
from Simon Willison
Filed under:
Related Notes
- The upshot for the industry at large, is: the **LLM-as-Moat model h...from Steve Yegge
- We don't quite know what to do with language models yet. But we...from Maggie Appleton
- These are wonderful non-chat interfaces for LLMs that I would total...from Maggie Appleton
- combining search and AI chat is actually the wrong way to go and I ...from Garbage Day
- the tech am I digging recently is a software framework called **Lan...from Interconnected
- Part of what makes LoRA so effective is that - like other forms of ...from Dylan Patel
- In many ways, this shouldn’t be a surprise to anyone. The current r...from Dylan Patel
- We’re building apps to surround and harness AI, but we need microsc...from Interconnected