Microsoft Researchers See 'Sparks of Artificial General Intelligence' in Powerful AI System

Microsoft researchers claim their AI system showed sparks of human-level intelligence. After testing a powerful language model, they argued it exhibited a "deep and flexible understanding" of complex topics and combined knowledge in humanlike ways.

CloudNerve™ | Microsoft Researchers See 'Sparks of Artificial General Intelligence'

When scientists at Microsoft started testing a new artificial intelligence system last year, they gave it a challenge that seemed to require an intuitive grasp of the physical world. "Here we have a book, nine eggs, a laptop, a bottle and a nail," they said. "Please tell us how to stack them in a stable way."

The researchers were surprised by the ingenuity of the A.I. system's response. Place the eggs on the book, it said. Arrange the eggs in three rows with space between them. Make sure you don't crack them.

"Put the laptop on top of the eggs, with the screen facing down and the keyboard facing up," it wrote. "The laptop will fit snugly within the boundaries of the book and the eggs, and its flat and rigid surface will provide a stable platform for the next layer."

The clever suggestion made the researchers wonder if they were seeing a new kind of intelligence.

In March, they published a 155-page paper arguing

that the system was a step toward artificial general intelligence, or AGI, which means a machine that can do anything the human brain can do. The paper was published on an internet research archive. Microsoft, the first major tech company to release a paper making such a bold claim, stirred one of tech's touchiest debates: Are companies building something like human intelligence? Or are some of tech's brightest minds letting their imaginations run away with them? "I started very skeptical — and ended up frustrated, annoyed, even scared," said Peter Lee, who leads research at Microsoft. "You think: Where is this coming from?"

Microsoft's paper, called "Sparks of Artificial General Intelligence," goes to the heart of what technologists have been working toward — and fearing — for decades. If they build a machine that works like the human brain or even better, it could change the world. But it could also be dangerous. And it could also be nonsense.

Making AGI claims can ruin reputations

What one researcher sees as intelligent can easily be explained by another, and the debate often sounds more suited to philosophy than computer science. Last year, Google fired a researcher who claimed a similar A.I. system was sentient, a step beyond what Microsoft claimed. A sentient system wouldn't just be intelligent; it could sense and feel.But some believe tech has recently inched toward something that can't be explained away: An A.I. system coming up with humanlike answers and ideas that weren't programmed into it. Microsoft has reorganized parts of its research labs to explore the idea. One group will be run by Sébastien Bubeck, lead author of Microsoft's AGI paper.

About five years ago, companies like Google, Microsoft and OpenAI began building large language models, or LLMs. Those systems analyze vast amounts of digital text to generate their own text, including papers, poetry and code. They can even chat. The tech Microsoft used, OpenAI's GPT-4, is considered the most powerful. Microsoft invested $13 billion in OpenAI. The researchers asked GPT-4 for a math proof in rhyming verse that there are infinite prime numbers. "At that point, I was like: What's going on?" Bubeck said. For months, they documented GPT-4's complex behavior, which showed "deep and flexible understanding" of fields from politics to physics, history to coding — combining knowledge in new ways. "People are amazed at its ability to generate text," Lee said. "But it's far better at analyzing, synthesizing, evaluating and judging text than generating it.

"When asked to draw a unicorn in code, it instantly did. When asked to fix the code to again draw a unicorn after its horn was removed, it did. It wrote a program to assess diabetes risk, a letter as Gandhi to his wife backing an electron for president, and a Socratic dialogue on LLM risks."Everything I thought it couldn't do? It certainly did much of it — if not most," Bubeck said.

Some experts saw "Sparks of A.G.I." as opportunistic hype. LLMs don't grasp the physical world, they said, and reasoning requires that. "'Sparks of A.G.I.' co-opts research papers for P.R.," said CMU's Maarten Sap. "They admit their approach may not meet scientific standards."Bubeck and Lee said they didn't know how to describe GPT-4's behavior, so chose a flashy title to spark interest. Because Microsoft tested an early GPT-4, claims can't be verified. Microsoft says the public version is less powerful. LLMs seem to mimic human reasoning, but there are also times when they seem terribly dense. “These behaviors are not always consistent,” Ece Kamar, a Microsoft researcher, said.

Alison Gopnik, a professor of psychology who is part of the A.I. research group at the University of California, Berkeley, said that systems like GPT-4 were no doubt powerful, but it was not clear that the text generated by these systems was the result of something like human reasoning or common sense.

“When we see a complicated system or machine, we anthropomorphize it; everybody does that — people who are working in the field and people who aren’t,” Dr. Gopnik said. “But thinking about this as a constant comparison between A.I. and humans — like some sort of game show competition — is just not the right way to think about it.”