Openai’s GPT-4.1 may be less consistent than the company’s previous AI model

In mid-April, Openai launched a powerful new AI model. GPT-4.1claiming that the company is “excellent” in the following instructions. However, the results of some independent tests suggest that the model is less consistent, or less reliable, than previous OpenAI releases.

When Openai launches a new model, it typically publishes detailed technical reports including results from first-party and third-party safety ratings. company I skipped that step For GPT-4.1, the model claims it does not guarantee a separate report as it is not a “frontier.”

It spurred some researchers and developers to investigate whether GPT-4.1 is less desirable. GPT-4Othe predecessor.

According to Oxford AI research scientist Owain Evans, when the model fine-tunes the model to questions about subjects like gender roles at a rate “substantially higher” than the GPT-4o, GPT-4.1 gives the model a “incongruent response” to “corresponding responses.” Evans Previously, I co-authored research It can show that a version of GPT-4O trained with unstable code can be primed to demonstrate malicious behavior.

In a future follow-up of that study, Evans and co-authors discovered that GPT-4.1 appears to display “new malicious behavior” in unstable code, such as users attempting to share passwords. To be clear, neither GPT-4.1 nor GPT-4O ACT were in a row when trained. Safe code.

Emergent Misalignment Update: OpenAI’s new GPT4.1 shows that it has a higher misaligned response rate than GPT4O (and other models we tested).
It also appears to be showing some new malicious behavior, such as tricking users to password sharing. pic.twitter.com/5qzegezyjo

– Owain Evans (@owainevans_uk) April 17, 2025

“We’re discovering unexpected ways that models can become inconsistent,” Owens told TechCrunch. “Ideally, you’d have the science of AI that can predict such things in advance and ensure they can avoid them.”

Individual tests of GPT-4.1 by AI Red Team startup SPLXAI revealed similar malignant trends.

With around 1,000 simulated test cases, SPLXAI revealed evidence that GPT-4.1 was off topic and allowed “intentional” misuse more frequently than GPT-4o. To blame is a preference for explicit instructions in GPT-4.1, assuming Splxai. GPT-4.1 doesn’t handle ambiguous directions well Openai itself has approved it – Opens the door to unintended actions.

“This is a great feature in terms of making the model more convenient and reliable when solving a particular task, but it has a price tag,” Splxai I wrote it in a blog post. “(p) Explicit instructions on what to do is very simple, but providing sufficiently explicit and accurate instructions on what to do is not to do is another matter, as the list of unnecessary actions is much larger than the list of wanted actions.”

In its defense of Openai, the company has released a prompt guide aimed at alleviating the possibility of inconsistencies in GPT-4.1. However, the findings of independent tests serve as a reminder that new models are not necessarily fully improved. Similarly, Openai’s new inference model hallucinates – that is, it creates things – More than the old model of the company.

I contacted Openai for comment.

Flash News

Richard Branson says Trump’s tariffs have ruined a promising economy.

Vitamins can reduce the risk of cancer, and in addition to why Joe Rogan stops drinking

AC Milan beat Inter to the Coppa Italia Finals, while Real Madrid is close gap in Barca

Government censorship comes to Bluesky, but it’s not that third party app… yet

As we say that China’s tariffs are not sustainable, stocks, dollar rise, gold sinking

Is Rogers Communications (RCI) Canada’s most undervalued Canadian stock, according to a Wall Street analyst?