All Things In Moderation: Trade-Offs In Moderated & Unmoderated Usability Testing

At Blink we practice evidence-driven design. That means that the design recommendations and decisions we make are grounded in solid data and sound reasoning. But what counts as good evidence? What are the data and reasoning that stand behind a well-motivated design decision?

For the majority of our projects, our evidence comes from qualitative, observational research. Blink researchers are also fluent with quantitative methods, which we use in combination with qualitative methods to deliver the insights that will best meet our clients’ design and product development needs. In this blog series, Blink researchers describe the motivations behind our methods, explore the most effective ways of using qualitative and quantitative methods to address particular UX questions, and explain how we ensure the rigor of our research, regardless of the methods we use.

In our first installment, Siri Mehus takes on the subject of unmoderated usability research: When should a live researcher facilitate research sessions and when is remote unmoderated usability testing a reasonable choice?


New Tools For Usability Research

In recent years, a number of tools and services for conducting remote usability tests have emerged in the marketplace including Userzoom, Usertesting.com, Userlytics, TryMyUI, and others. These tools make it possible to test products with real users even if you don’t have access to a usability lab or a trained researcher to run the sessions, and without spending days or weeks recruiting participants. While this is clearly preferable to the alternative of not testing with users at all, how does it compare to the “gold standard” of in-lab moderated usability testing? Is this relatively inexpensive approach a typical case of “you get what you pay for?” Or are there situations when it can deliver the value we expect from full-service usability testing, but at a bargain price and breakneck speed?

Standard Usability Testing

While approaches and projects differ, usability testing as typically performed has some key components:

  • Users perform a set of tasks with a product or prototype.
  • Sessions are moderated in-person by a trained researcher.
  • Participants are asked to “think out loud” about what they are seeing and doing.
  • Testing takes place in a usability lab.
  • Participants’ talk and actions are recorded to facilitate later analysis.

This standard usability testing model can be and frequently is adapted by removing one or more of these elements. For instance, think-aloud protocol may not be employed when, for example, time on task is being measured and participants’ vocalizations could potentially interfere with task performance. Or usability tasks might be performed in a setting other than a usability lab, e.g., as part of a field study. Or tests may be conducted remotely rather than in-person when it is important to recruit participants from a wide geographical area. In those cases, a teleconferencing tool can be used to approximate the in-person moderated protocol at a distance.

A more significant divergence from the standard plan is to conduct sessions without moderation from a trained researcher. There are some good reasons for taking this approach, some reasons that sound good but really aren’t, and some definite drawbacks.

Remote Unmoderated Usability Testing

Remote unmoderated testing differs from the standard model in several respects:

  • Participants perform tasks on a website or prototype using their own computers or devices.
  • Sessions are not moderated; rather, participants follow directions that have been created beforehand.
  • Participants’ clicks and taps, pathways, page views, time on task, or other metrics can be tracked electronically.
  • Participants may be asked to provide ratings of satisfaction and ease of use or qualitative feedback in a survey or text entry box.
  • Some tools allow testing of apps and websites on mobile devices.
  • Participants’ screens and participants themselves may or may not be video-recorded for subsequent analysis.

This last point is a key distinction between types of unmoderated testing. Eliminating audio, video, or screen-recording further simplifies the process, but it also removes the possibility of learning from participants by directly observing their activity and listening to their commentary.

The Promise of Remote Unmoderated Usability Testing

Remote unmoderated testing offers some very attractive benefits. One is obviously and undeniably the cost. Lab costs, recording equipment, and perhaps most significantly, the time and labor of a skilled research moderator, can all be avoided.

The speed of remote unmoderated usability testing also contributes to its appeal in the lean tech environment. Recruiting and scheduling qualified participants for an in-person usability test can take two-to-three weeks. In contrast, if it is a simple recruit, completed tests may start to come in within hours of submitting to a remote usability testing service. The duration of testing can also be shorter, because multiple participants can take the test at the same time. This quick turnaround can be of great value to teams moving and iterating at a fast pace.

Lower costs and faster turnaround open up the possibility for testing with larger numbers of participants than a standard lab usability study. This sounds like a great advantage, but should be considered carefully. Larger numbers do not necessarily provide better answers. They can give you greater confidence in your answers to some questions, but those may not be the questions that product teams need answered.

False Promises?

Other reasons are sometimes given for taking this approach that, when considered carefully, fail to deliver as strong motivations for pursuing unmoderated testing.

Avoiding Effort

Remote unmoderated usability testing is cheaper and faster. Is it also easier? It might seem to be, as it does not require a moderator to be present for each session, asking questions, taking notes, and recording key observations. But if sessions are being recorded, they will still need to be reviewed, which could easily take just as long as running the sessions in the first place (and likely will be a lot less interesting).

You can save reviewing time by using metrics to identify key moments to observe directly. This requires careful study design to ensure that the appropriate metrics are being captured and used to trigger review. More importantly, it introduces the risk of missing valuable user feedback.

Certainly, workload can be reduced significantly by not recording user actions (or not reviewing recordings), but there will be a concomitant reduction in the value to be gained from the usability tests.

If you opt not to record sessions, other means must be used to determine what users do and whether they are successful. Asking users to report their own success or failure and comment on their experience is unreliable, at best. Event tracking may be limited. In a recent remote study of a mobile website, we found that we could only track participant actions that corresponded to a unique URL. We were able to overcome that limitation and detect participant actions with dropdown menus and carousels by adding code to the prototype, but putting that in place added another layer of investment of time and skilled effort.

Whether or not recordings are made, remote unmoderated tests must be carefully designed and written because it is impossible to clarify intent or make adjustments in the process of testing. An ambiguous question or misleading instruction can easily result in useless data if it leads participants in the wrong direction. Studies need to be written with the greatest attention to possible misinterpretation and should be piloted or pre-tested with several participants before release to a larger group.

In short, remote unmoderated usability testing can be easier and faster than in-lab testing, but doing it well requires concerted effort by someone with significant expertise.

Removing Bias

Some advocate unmoderated testing because moderators can inadvertently bias participants through the way they word their questions or respond to participant statements or actions. However, poorly worded session instructions in an unmoderated test can introduce as much bias as a poorly trained moderator.

Moderation actually offers opportunities to understand and thereby reduce the effect of participant bias. A competent moderator will probe participant responses in order to uncover biases that may be informing their responses.

For example, we sometimes hear participants express preferences for particular companies or brands. As researchers, we want to know why. For instance, if a participant says they would be more likely to trust Amazon, we will probe. If a Seattle resident says it is because Amazon is “local,” we recognize that user is displaying a bias that is not representative of this company’s global customers.

Moderated tests allow us the opportunity to start with open-ended questions and then probe to make sure that particular issues of interest are addressed. On the contrary, asking open-ended questions in an unmoderated test is riskier because the participant may take it in a direction that is not of much value to the researcher. The test-creator may thus choose to use more closed or directive questions, which can potentially limit possible answers.

In short, moderation is not inherently more biased than unmoderated testing and offers many advantages for better understanding participant motivations for acting or answering in a particular way (in other words, their biases).

Improving Validity

The idea that unmoderated testing can provide more valid results rests on the possibility of testing with larger numbers.

By testing with larger numbers of participants, we can claim with greater confidence how many users will experience a particular problem. However, it does not provide more information about why users are having that problem, or how to solve it. While it can be useful to know how prevalent an issue is, the information that is usually most valuable to product teams concerns what causes the problem and how to prevent it. That’s where moderated testing delivers the best insights.

Where Unmoderated Usability Testing Falls Short

Understanding Why

Unmoderated testing is useful for learning whether or not users are able to navigate a site or use a product. It can identify where things go wrong and, if a large enough sample is recruited, predict how frequently particular problems will occur. However, this is not where designers and product teams tend to get the most value from usability testing. Understanding why users are having a particular problem provides direct guidance for making design changes.

A very simple example:

Unmoderated usability testing reveals that no one is clicking on a call to action.

The product team decides they need to make it more prominent by changing the call to action from a text link on the side of the screen to a big green button at the top.

Versus

Moderated usability testing uncovers that no one is clicking on the call to action.

Participants indicate that they do not think that is the right place to complete the action they want to perform.

Product team understands that they need to change the labeling in order to convey the purpose of this link.

Making a call to action more prominent will not help if users do not believe it is the right place to go to perform the action they want to accomplish. In the first scenario, research and design resources are wasted because, while testing uncovered a problem, it did not uncover the reason behind the problem.

Adaptability

Moderated testing provides the opportunity to find out why a user makes a particular choice, understand their interpretation of what they see, and follow a user down another path, which can lead to unexpected insights.

When we notice that a participant has fallen silent, understandably focused on the challenging task we have asked them to complete, we can gently remind them to verbalize their impressions and intended actions.

In an artificial scenario, it is a challenge to ensure that users act in the way they would naturally. A lab may not be a natural scenario, but a co-present moderator can remind and encourage participants to take tasks seriously and truly imagine themselves in the envisioned situation. Participants engaging in these tasks on their own in an unmoderated environment may be more likely to simply “go through the motions” in order to finish a test and collect their payment.

Conclusion

We welcome the emergence of new tools for conducting usability research; we have used remote unmoderated testing in our work for clients with success, and will do so again. We are always excited about new ideas for learning from users and see possibilities for ways this approach can be modified and expanded to improve its value and minimize some of its drawbacks.

More than anything, we believe in using the right tool for the right job. Understanding ‘why’ is always number one for us, but there are times when gathering ‘what’ or ‘how often’ data in a lightweight way can offer clients more comprehensive insights. Remote unmoderated usability testing is one tool we can use to do this, but we know it isn’t always the best way to answer the questions our clients care most about.

Similar Articles