October 18, 2016 |

5 Design Tools for Voice UX

The number of products with a voice component grows daily – just look at the recent announcements of Google Home, the Amazon Echo Dot, and Samsung’s acquisition of Viv. Yet it’s hard to find examples of design tools and artifacts in the voice UX space.

Why We Need Voice Design Tools

Great user experiences don’t just happen, not on screens and not in voice or chat.

Just because voice interaction is a relatively new technology doesn’t mean the fundamentals of user-centered design don’t apply. – Kathryn Whitenton, The Most Important Design Principles of Voice UX

Whenever we create a product or service, whether it’s screen- or voice-based, we aim for an exceptional user experience. We want to mock up ideas, vet them, and adjust before we commit to building, so we need tools and artifacts to support that effort.

Here are some of the tools and artifacts we’ve created during our time designing, evaluating, and developing voice UX.

1. Text Scenarios

A staple in the UX toolbox, scenarios are great for any sort of conversational UI, too. They let us quickly flesh out the high level interaction between user and product.

Example of a dialog between a person and a voice device (Alexa). Person: Alexa, ask Seattle Ferry, how much space is there on the next ferry from Seattle to Bainbridge? Alexa: There are 12 car spaces available on the 5:15 sailing from Seattle to Bainbridge. Person: How about the one after that? Alexa: There are 68 car spaces available on the 6:10 sailing. Person: How many car spaces on the first one? or: Is the 2:35 full or: How many spaces are available on the 2:35? Alexa: There are 12 spaces on the 2:35 sailing from Seattle to Bainbridge. OR There are no car spaces available on the 2:35 sailing from Seattle to Bainbridge, but there are 63 car spaces available on the 3:15 sailing.

Voice dialog in a multi-modal experience. The sample dialog (input of measurements) is overlaid on the wireframe of the associated screen.

2. Storyboards

Storyboards show the interaction in context, whether it’s in the kitchen, meeting room, or on the go.

Two guys playing the card game Magic the Gathering at a table. An Amazon Echo device (“Alexa”) is on the table. Player 1: L.O.L. You can’t do that! Player 2: WHAAAT?!?! Player 1: Alexa, what are the rules for phasing and tokens? Alexa: Tokens in the phased-out zone cease to exist. This is a state-based effect. Player 1: What about Equipment? Alexa: Any phased out Auras or Equipment that were attached to those tokens remain phased out for the rest of the game. Player 1: Knew it!

A one-panel storyboard showing a voice interaction in context.

Driver: Hey Siri, how do I get to the Showbox? Siri (over car stereo): It's in downtown Seattle, would you like me to get you to the nearest parking facility? Driver: Yeah, thanks! Siri: Your friend Matt posted he would be at the Showbox, too. Do you want to send a message?

A simple multi-panel storyboard of voice interaction over time.

3. Videos

Videos add high-fidelity audio and visuals to the mix. They look and sound real, right down to the actual product voice used, though the entire dialog is scripted, with manually triggered responses using text-to-speech and the product voice module. (This same approach of serving canned phrases can be used effectively for Wizard-of-Oz style user testing.)

4. Flow Maps

The conversational nature of voice UX makes for tangled flows. Unlike traditional, hierarchical phone tree dialogs, current mobile and ambient voice products are designed to be less scripted, so there’s a lot of branching and conditions that need to be represented for developers to build from.

Here’s a partial example from our Washington State Ferries Alexa skill:

Example of a flow map

5. Phrase Maps

Conversational UX also means dealing with the different ways in which people phrase their intent. For each user intent in the Flow Map, the Phrase Map identifies the phrasings or utterances the product will recognize.

Example of a phrase map

And it’s easy to imagine a script that can then turn these human-readable phrasing specs into the machine-readable format for development:

Example of Alexa-ready phrase formatting


At its core, interaction design is dialog design: the dialog between a person and technology, whether it’s mediated by a screen or a voice. We need design skills and artifacts to ensure we create great voice experiences, things that people will be able to use and want to keep using once the infatuation with the new technology has worn off.

Damon is principal designer at Blink UX. When not deconstructing UX wherever he finds it, he enjoys traveling and rock climbing, often at the same time.

Leave a Reply

Your email address will not be published. Required fields are marked *