No One Knows How AI Works

Don’t believe me? Ask ChatGPT

5 min read2 days ago

The Blind Leading the Blind by Pieter Bruegel the Elder, 1568

Join The Algorithmic Bridge, a blog about AI that’s actually about people.

I. A black box we can’t seem to open

There’s a fascinating research area in AI the press doesn’t talk about: mechanistic interpretability. A more marketable name would be: “How AI works.” Or, being rigorous, “how neural networks work.”

I took a peek at recent discoveries from the leading labs (Anthropic and OpenAI). What I’ve found intrigues and unsettles me.

To answer how neural nets work we need to know what they are. Here’s my boring definition: A brain-inspired algorithm that learns by itself from data. Its synapses (parameters) change their value during training to model the data and adapt the network to solve a target task. One typical target task is next-word prediction (language models like GPT-4). You can also recognize cat breeds.

A neural net isn’t magic, just a program stored as files inside your PC (or the cloud, which is slightly magical). You can go and look inside the files. You’ll find decimal numbers (the parameters). Millions of them. But, how do they recognize cats? The answer is hiding in plain sight, in numeric patterns you can’t comprehend. Humans can’t decode how they cause behavior. Not even our best…

No One Knows How AI Works

Don’t believe me? Ask ChatGPT

I. A black box we can’t seem to open

Written by Alberto Romero