Two leading artificial intelligence companies recently introduced tools for creating teams of AI agents. As part of this industry trend, researchers at the AI company Anthropic described a major experiment in coding. Researcher Nicholas Carlini set sixteen separate instances of the Claude Opus AI model to one goal: creating a C compiler from nothing. The project shows the potential of AI teams but also reveals serious current limitations.
Carlini gave the AI agents very little direction. Over two weeks, they completed nearly 2,000 coding sessions, costing around $20,000 in AI usage fees. The agents produced a compiler written in the Rust programming language. This compiler contains about 100,000 lines of code. It can build a bootable version of the Linux kernel for three important computer chip architectures: x86, ARM, and RISC-V.
To manage the project, Carlini used a new Claude feature called "agent teams." Each AI ran in its own isolated container. They all worked from a shared online code repository. The agents claimed tasks by creating lock files and sent their finished code back to the shared space. There was no central boss directing them. Each AI independently found the most obvious problem to solve next and started working. When their code changes conflicted, the AI agents resolved those conflicts by themselves.
The finished compiler is available on GitHub. It can compile several major open-source projects, including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. In testing, it passed 99 percent of the GCC torture test suite, a strict standard for checking compiler correctness. In what Carlini called "the developer’s ultimate litmus test," the compiler successfully compiled and ran the classic video game Doom.
A C compiler is a nearly perfect task for semi-independent AI coding. The rules for the C language are decades old and very clearly defined. Complete test suites already exist. There is also a known, trusted reference compiler, GCC, to check results against. Most real software projects do not have these clear advantages. The real difficulty in most development is not writing code that passes tests; it is first deciding what the tests should be.