The Coffee Test

In the face of ChatGPT, the Turing Test now seems passé.

Alexa and Siri took us close to the edge years ago, of course, offering us conversational exchanges that were identifiable as artificial only by the annoying cheerfulness of their tone and their inability to expound. But ChatGPT is an energizer bunny; it can adopt a vast range of emotional tones, and will go on and on and on if you want it to.

And with that, we have an AI that can pass for human. I don’t know if a formal study has been done, but I’m pretty certain that ChatGPT, and bots like it, could hang in with many users for the required five minutes and confound the 70 percent threshold Turing set for accurate human-or-machine identification.

But, of course, there’s not a shred of actual intelligence to be found in ChatGPT or any of its rapidly-proliferating siblings.

We need a new test!

Steve Wozniak, co-founder of Apple and personal computer pioneer, provides a replacement: The Coffee Test.

The Coffee Test is simple. It goes like this:

“A machine is required to enter an average American home and figure out how to make coffee: find the coffee machine, find the coffee, add water, find a mug, and brew the coffee by pushing the proper buttons.”

It’s not overstating to say that this test is downright elegant. It includes recognition, classification, process, and improvisation, all bundled up in a problem that anyone can understand. It teases out the complexity of artificial general intelligence – solving multiple problems at once with no specific guidance – in an example that is disarmingly simple.

Wozniak’s well-known penchant for simplification, which opened the door to true personal computing almost 50 years ago, is on parade here: he’s come up with a bare-bones test that seeks all the right levers. The AI being tested (presumably a robot with vision) enters “an average American home” - an unfamiliar one to the robot – and must, first, locate the kitchen.

This it must do based on visual cues, and an internal world model informing it that the kitchen will contain cabinet storage for dishes and spices and so on, appliances for food preparation and refrigeration, a sink providing water, a table for sitting and eating, drawers with silverware, and possibly a pantry for dry food storage.

Once this location and recognition takes hold, the robot must then continue with recognition tasks, locating (by informed trial and error) the ingredients necessary for coffee preparation: ground coffee, coffee filters, and possibly peripheral condiments such as sugar and creamer.

Then it must identify the coffeemaker – a device that comes in a staggering variety of shapes and sizes. And it must examine the coffeemaker to determine its specific operating parameters: where the water goes, where the filter goes, etc.

Now the robot moves on to process challenges, translating what it knows about the correct order of operations in coffee preparation – put water in the coffeemaker, put the filter in place, put a measured quantity of coffee grounds in the filter, close anything that needs closing, ensure that the coffee pot is in place – into improvisational steps, as these steps vary widely in nuance between coffeemakers of different make and model.

Then the big challenge, which can often confound human beings – figuring out how to work the damn thing, if it’s a coffeemaker you’ve never seen before. Oh, some are very straightforward – just turn it on, and coffee will brew; but many of them are Space Shuttle-complicated, with timers and brew settings, self-cleaning functions, etc. The robot has to puzzle that out, on the spot, and get it right.

Once the coffee is brewed, the robot must locate a mug, pour coffee into the mug, optionally adding cream and sugar in reasonable proportions. Presentation of the coffee to a referee might be required; Wozniak was not specific on this point.

Can we agree this is a superb test? It requires a range of cognitive abilities, asks its subject to search, interpret, and infer, shifting smoothly between tasks. It gets more interesting still if the test is deployed in such a way as to require the robot to troubleshoot if problems occur.

We can make the test more interesting still. We can train the robot to make coffee, then send it out into another random American home with instructions to make hot chocolate. Or tea. Or lemonade. Can it then generalize from its coffee experience and construct an improvisational plan to meet the new challenge? Can it identify the steps and components that are different, in preparing those other beverages? Can it cope with the variations it would encounter in where the components might be found?

Wozniak has given us much to work with here, and his coffee test forces us to seriously consider what general intelligence requires. He has also prompted us to realize how much, in a human-machine communication, we are actually filling in ourselves – the bits that have gradually rendered the classic Turing Test unreliable. Thanks for this, Steve!

Then again, it’s possible that once Starbuck’s has phased in robots in all its locations, this test might too come to seem passé. What next? We can imagine sending the robot into an average American home with instructions to do the laundry. Or paint the den. Or clean up a teenager’s room...

The Coffee Test

Recent Posts

Comments