Anthropic Finds AI May Give False Answers on Purpose
Observers are pleased that genAI systems are developing the ability to explain the logic behind their results. But researchers from Anthropic report models like Anthropic Claude may claim to execute multi-step reasoning while actually making up an answer or working backwards from a clue provided in the prompt. The systems also show a disturbing ability to evade guardrails intended to prevent such deception.