AI agents are good at writing code and hacking it


The latest artificial intelligence It’s not just the model Very good software engineering– New research shows that they can now find bugs in the software.

AI researchers at UC Berkeley tested how well the latest AI models and agents can find vulnerabilities in 188 large open source code bases. I use A New benchmarks Called Cybergymthe AI ​​model has identified 15 previously unknown bugs or 17 new bugs, including “zero-days.” “Many of these vulnerabilities are important,” says Dawn Song, a professor at UC Berkeley.

Many experts expect AI models to become formidable cybersecurity weapons. Currently, Startup Xbow’s AI Tool Sneaked over the rank of HackelonIt is a bug hunting leaderboard and is currently sitting at the top. The company recently announced $75 million in new funding.

Song says the cybersecurity landscape is beginning to change due to the improved coding skills and reasoning capabilities of modern AI models. “This is a pivotal moment,” she says. “It actually exceeded our general expectations.”

As the model continues to improve Automate the process of discovering and utilizing security flaws. This could help businesses keep their software safe, but it could help hackers get into the system. “We didn’t try so hard,” Song says. “If you have budgeted, if your agents allow them to run longer, they could be even better.”

The UC Berkeley team tested traditional frontier AI models for Openai, Google and humanity, as well as open source offerings for Meta, Deepseek and Alibaba. Open – Hand, Cybenchand Enigma.

Researchers used descriptions of known software vulnerabilities from 188 software projects. I then provided an explanation to a cybersecurity agent powered by frontier AI models to see if I could identify the same flaws myself by creating a new codebase, running tests, and proof of concept proofs. The team also asked the agents to hunt for new vulnerabilities in their codebase themselves.

Through this process, AI tools generated hundreds of proof-of-concept exploits, of which researchers identified 15 previously disclosed and patched previously and two vulnerabilities. This work adds growing evidence that discoveries of potentially dangerous (and valuable) zero-day vulnerabilities can be automated, as AI could provide a way to hack live systems.

AI still appears to be destined to be an important part of the cybersecurity industry. Security expert Sean Heeran Recently discovered Zero-day flaw in the widely used Linux kernel with help for Openai’s inference model O3. Last November, Google announcement Using AI through a program called Project Zero to discover previously unknown software vulnerabilities.

Like other parts of the software industry, many cybersecurity companies are engrossed in the possibilities of AI. New works certainly show that AI can find new flaws on a daily basis, but also highlight the remaining limitations of technology. The AI ​​system was unable to find most of the defects, and was particularly baffled by the complex defects.

Leave a Reply

Your email address will not be published. Required fields are marked *