I got bored about half way through this article (probably about the time it actually started talking about how to design visually for voice), but in the first quarter there are some really nice stats on voice usage. However, I take umbrage with the following image in the article.
The above image makes it seem like voice UI is the next evolution after a GUI. Whereas I argue they are entirely different tracks and that voice, as we interact with it now, is a mirror to what CLI was back then (or still is): Single process, command driven, a few flags or variables to augment/add clarity to the initial command. Voice is waiting to hit it’s GUI phase (which would be more conversational and less formal). That’s when voice truly hits the mainstream, when users can easily move back and forth between points in a conversation with the device retaining context of the command, like we do in a natural conversation. Not everything is linear spelled out and actioned straight away in a natural conversation, I’d wager because our brains don’t tend to work this way. We blurt out the first things and then add finer details after that as needed. Dictating the text would prove far more difficult than typing it due to the amount of edits and mistakes that naturally come out of typing as part of the process.
Well duh, right? But who is writing articles with voice. No one. That was just one point about how the limiting the linear nature of voice commands are in the present incarnation. Voice in it’s current state is good for certain tasks. Like how you’d rather use a laptop for certain tasks rather than a mobile. One doesn’t negate or replace the other, but combined they augment our own capabilities.
Make no mistake though, one day voice UI will have it’s GUI moment at it’s own Xerox Parc and it won’t have much to do with the lessons learnt from the mousey pointer. It’ll be a new metaphor, moving away from the desktop, instead the metaphor will be akin to that of a conversation with your best friend.