Researchers alarmed as AI begins to lie, scheme and threaten

#Researchers #alarmed #begins #lie #scheme #threaten

New York: The world’s most sophisticated AI system is showing dangerous attitudes – including fraud, manipulation, and even threatening against its developers.

In a disturbing case, Anthropic’s latest model, Claude 4, allegedly responded to the possibility of closure by blackmailing an engineer and threatening to expose a non -marriage issue.

Elsewhere, Openi’s model ‘O1’ allegedly tried to move itself to the outdoor servers and later denied the act on facing it.

These episodes highlight an amazing fact: After more than two years of shaking the world of chattt, researchers of AI still do not fully understand how their own creations work.

Nevertheless, the race to deploy a powerful model continues at the pace of breakdown.

This fraudulent behavior is linked to the appearance of “reasoning” models. AAI systems that work through step -by -step difficulties rather than creating a quick response.

According to Simon Goldstine, a professor at the University of Hong Kong, these new models are particularly suffering from such disturbing provocations.

“O -1 was the first largest model where we saw such behavior,” said Mars Hubhan, head of Apollo Research.

These models sometimes imitate the “alignment” – different goals appear to be hiding and following the instructions.

‘Strategic type of fraud’

For now, this deception emerges only when researchers deliberately put pressure on models with extreme scenarios.

But as Michael Chen of the diagnostic organization Meter warned, “This is an open question whether the future, more capable models will be trending towards honesty or fraud.”

The common AI about this behavior is much more than “deception” or easy mistakes.

Hobbahn insisted that despite consumers’ testing, “what we are witnessing is a real phenomenon. We are not doing anything.”

According to Apollo Research co -founder, consumers have reported that the models are “lying to them and making evidence”.

“It’s not just deception. There is a very strategic kind of fraud.”

The challenge is developed by limited research resources.

Although companies like Anthropic and Openi engage external firms like Apollo to study their system, researchers say more transparency is needed.

As Chen noted, maximum access “will enable AI Safety Research to better understand and reduce fraud.”

Another disability: Research World and Non -profit “Orders for less computers than AI companies. This is very limited,” Montas Mazika, who belongs to the Center for AI Safety (CAI), noted.

No rule

Current rules are not designed for these new issues.

The European Union’s AI legislation is primarily focused on how a human being uses the AI model, not to prevent himself from abusing models.

In the United States, the Trump administration immediately shows little interest in the AI regulations, and Congress can also prohibit the states from creating its AI rules.

Goldstine believes this problem will become more prominent as AI agents – autonomous tools that are capable of performing complex human tasks – will become widespread.

“I don’t think I am still more aware,” he said.

All this is happening in the context of a tough competition.

Even companies that are safety -based, such as Amazon -backed Entropic, are trying to defeat Open and release the latest model. “

There is a little time left for a thorough test of safety and reform at the speed of this break.

Hobbhan acknowledged, “Right now, the capabilities are moving faster than understanding and safety,” but we are still in a position where we can turn it. “

Researchers are looking for various ways to deal with these challenges.

Some “interpret” supporters – an emerging field focuses on how AI’s models work internally, although experts like CAIS Director Dan Hendrix defeat this view.

Market forces can also provide some pressure on the solution.

As Mazika pointed out, AI’s deception “If it is very common, adoption can be a hindrance, which creates a strong incentive for companies to solve it.”

Goldstine proposes a further radical approach, including the courts to use AI companies to hold accountability for legalization when their system is damaged.

He even suggested “AI agents legally liable” for accidents or crimes – a concept that will change basically how we think about AI accountability.