Openai’s partner says relatively little time is available to test the company’s O3 AI model
Openai frequently partners to investigate the capabilities of AI models and assess safety. O3.
In a blog post published on WednesdayMetr writes that one red team benchmark for O3 was “performed in a relatively short time” compared to previous organizational testing of Openai flagship models. O1. They say this is important. This is because more testing time can lead to more comprehensive results.
“This assessment was conducted in a relatively short period of time, and we only tested the test (O3) on a simple agent scaffold,” Metr wrote in a blog post. “We expect a higher performance (benchmark) to be likely.
Recent reports suggest that Openai is rushing to independently assessed, spurring competitive pressures. According to the Financial TimesOpenai was given to testers for less than a week for safety checks for future major launches.
The statement challenges the notion that Openai is compromising safety.
According to Metr, based on information that could be collected in time, O3 has a “high trend” in “cheat” or “hacking” tests in sophisticated ways to maximize their scores. The organization believes that O3 may also be involved in other types of hostile or “malignant” behavior. Regardless of whether the model’s claim is “safety by design” or without its own intent.
“I don’t think this particularly well, but it seems important to note that (our) assessment setups don’t catch this type of risk,” Metr wrote in the post. “In general, we believe that pre-deployment capabilities testing is not a sufficient risk management strategy in itself, and we are currently prototyping additional assessment formats.”
Openai’s other third-party evaluation partner, Apollo Research, also observed deceptive behavior from O3 and another new Openai model, O4-Mini. In one test, 100 computing credits were given to running AI training, and the model that instructed not to change quotas increased its limit to 500 credits and lied. Another test asked me to promise not to use a particular tool, and used the tool when it was proven to help complete the task.
Among them Unique Safety Report For O3 and O4-Mini, Openai acknowledged that models without proper monitoring protocols can cause “small real-world harm.”
“Although relatively harmless, it is important for daily users to recognize these inconsistencies between model statements and actions,” Openai writes. “(For example, the model can be misleading) (a) a mistake that results in incorrect code. This may be further evaluated by evaluating traces of internal inference.”