Week 11 – Usability Tests and Evaluator Bias

Friess, Erin. “Discourse Variations Between Usability Tests and Usability Reports.” Journal of Usability Studies 6, no. 3 (2011): 102-16.

The article “Discourse Variations Between Usability Tests and Usability Reports” documents research on discourse analysis techniques to compare the language used by the end-users to the language used in the evaluators’ oral reports. Friess conducted five rounds of formative testing involving three pairs of team members who were all novices in conducting usability tests; these team member pairs were each assigned a participant. A team of raters read the transcripts, watched the video recordings, and “determined if any of the issues mentioned in the oral reports but not mentioned by the usability participant could be reasonably assumed by the actions of the usability participant in the think-aloud usability test” (105). From there, 83.9% of the findings had some basis in the usability testing; however, 65% of these were accurate findings and 34.6% were potentially inaccurate findings (106). Both the data for the accurate and inaccurate findings came from the sound bites and from evaluator interpretation. The discussion portion of Friess’s article comments on the gatekeeper role of the evaluators and how this powerful role could explain why differences exist between the language used by end users and the language used by evaluators’ oral reports. Four possible explanations include the following:

Confirmation Bias in Oral Reports – The evaluators appeared to seek out confirmation for issues they had previously identified (110)
Bias in What’s Omitted in the Usability Reports – The evaluators at no time presented finding to the group that ran counter to a claim the evaluators made at a previous meeting (111)
Biases in Client Desires – The evaluators did not mention a specific participant desire for an index because the client had already specified that including an index was not an option (112)
Poor Interpretation Skills – The evaluators were not well-experienced in this kind of study and therefore clung to the few pieces of data that they understood well (sound-bite data) (113)

Honestly, I found this article to be somewhat surprising. It bothers me that so many evaluators will orally communicate information about the participants’ tests without actually referencing the transcript or notes of the test itself. Relying solely on memory, particularly when there might be specific biases at stake, has proven faulty time and time again. For example, my client for my STEM brief is currently studying the consistency of positive flashbulb memories over time. Her research indicates that as time passes, individuals continue having a strong belief in the veracity of their memories while results indicate that the consistency of those memories actually decreases. Even though the team members conducting the tests do not experience flashbulb memory while monitoring or observing the participants, there is still the element of unreliability when it comes to presenting information from their memories rather from documented data. The whole process of learning to interpret results accurately and without bias is a crucial component of good practice in technical communication. This study reminds me that certain parameters need to be in place even in a semi-informal testing environment of do-it-yourself usability practice. How could conducting experiments in this fashion hurt the credibility of technical communication as a field? What are some safeguards to put into place to avoid such biases and discrepancies in the future with do-it-yourself usability test

This entry was posted on Sunday, November 6th, 2011 at 7:57 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

dot5185