I’m working through my music library in chronological order, which has been a thrilling rediscovery of my favorite music in its original context. When 1975 came around, I asked Siri to cue up a Soft Machine album and the resulting failure was another fascinating lesson in interaction design, in the spirit of my original Siri Teaches Interaction Design post. Let’s dissect it!
The oldest user interfaces are command-lines — textual dialogs with a system that pedantically interprets everything exactly as you type it, and gives up entirely if you type something it doesn’t understand. They’re gloriously unambiguous, but require you to learn their hyper-precise, utterly unnatural language.
In contrast, conversational interfaces like Siri (and Cortana and Google Now) make that compromise the other way around: much more ambiguous, but with almost no special language to learn at all. Just speak your native language! All the pressure of making a computer understand a human is taken off of you and shifted onto the poor designers and engineers. Their first colossal challenge is recognizing your voice — what did you say? Can it hear you in a noisy room? How about if you’re one of the millions of people who mix languages in their daily life, or you want to contact someone with an uncommon name? What if you have an accent? Then, once they’ve got the words they think you said, comes the job of interpreting them — what did you mean?
This is probably an artificial general intelligence problem: the day we finally solve it is probably also the day that we birth a superintelligence that hopefully won’t kill us all. Until then, the best we can do is to put together fuzzy systems that use context and a library of prior knowledge to weight possibilities and ultimately guess what your words probably mean.
Anyway, that Siri failure. I figured that asking for the album Bundles would be good enough, considering that I only have one album by that name in my music library. Instead I got a single from an artist I’d never heard of, and it couldn’t even play. What surprised me about the result was how many distinct prioritization steps it seems not to have done:
Not making any of these prioritizations suggests that what’s actually going on in this system is an old-fashioned, linear, command-line-like process: Do a music search; get the results; pass the first result to the player; if something goes wrong, return an error and give up. But a conversational interface demands a more organic, probabilistic process: Do a search; get the results; weight the results based on context and prior knowledge; try the best match; if something goes wrong, go back and try something else.
Thank you. Be well!