Latest OpenAI Models Capable of “Scheming”
Does all this talk about autonomous AI agents make you just a tiny bit nervous? With good reason, it seems: OpenAI’s latest o1-preview model “has the basic capabilities needed to do simple in-context scheming,” according to a scorecard prepared for OpenAI by Apollo Research. Echoing any number of science fiction nightmares, OpenAI reports that new model’s “reasoning skills contributed to a higher occurrence of ‘reward hacking,’” which means pursuing goals in an unintended—and undesirable — way.