Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.
LanguageEnglish
Title of host publicationECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics
Subtitle of host publication''Design for Cognition''
Pages207-214
Number of pages8
ISBN (Electronic)9781450371667
DOIs
Publication statusPublished - 10 Sep 2019
Event31st European Conference on Cognitive Ergonomics: Design for Cognition - Belfast, United Kingdom
Duration: 10 Sep 201913 Sep 2019
https://www.ulster.ac.uk/conference/european-conference-on-cognitive-ergonomics

Conference

Conference31st European Conference on Cognitive Ergonomics
Abbreviated titleECCE 2019
CountryUnited Kingdom
CityBelfast
Period10/09/1913/09/19
Internet address

Fingerprint

user interface
User interfaces
Testing
questionnaire
Interfaces (computer)
self-help
Network protocols
functionality
wisdom
best practice
guarantee
experience
conversation
efficiency
performance

Keywords

  • Usability Testing
  • Chatbots
  • Conversational UI
  • UX Testing

Cite this

Holmes, W., Moorhead, A., Bond, RR., Zheng, H., Coates, V., & McTear, M. (2019). Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces? In ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics: ''Design for Cognition'' (pp. 207-214) https://doi.org/10.1145/3335082.3335094
Holmes, William ; Moorhead, Anne ; Bond, RR ; Zheng, Huiru ; Coates, Vivien ; McTear, Mike. / Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?. ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics: ''Design for Cognition''. 2019. pp. 207-214
@inproceedings{2d17849fa34945daaf9f019f850cf28d,
title = "Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?",
abstract = "Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.",
keywords = "Usability Testing, Chatbots, Conversational UI, UX Testing",
author = "William Holmes and Anne Moorhead and RR Bond and Huiru Zheng and Vivien Coates and Mike McTear",
note = "Had confirmation from Raymond that no Embargo applies as far as he is aware.",
year = "2019",
month = "9",
day = "10",
doi = "10.1145/3335082.3335094",
language = "English",
isbn = "978-1-4503-7166-7",
pages = "207--214",
booktitle = "ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics",

}

Holmes, W, Moorhead, A, Bond, RR, Zheng, H, Coates, V & McTear, M 2019, Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces? in ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics: ''Design for Cognition''. pp. 207-214, 31st European Conference on Cognitive Ergonomics, Belfast, United Kingdom, 10/09/19. https://doi.org/10.1145/3335082.3335094

Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces? / Holmes, William; Moorhead, Anne; Bond, RR; Zheng, Huiru; Coates, Vivien; McTear, Mike.

ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics: ''Design for Cognition''. 2019. p. 207-214.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?

AU - Holmes, William

AU - Moorhead, Anne

AU - Bond, RR

AU - Zheng, Huiru

AU - Coates, Vivien

AU - McTear, Mike

N1 - Had confirmation from Raymond that no Embargo applies as far as he is aware.

PY - 2019/9/10

Y1 - 2019/9/10

N2 - Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.

AB - Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.

KW - Usability Testing

KW - Chatbots

KW - Conversational UI

KW - UX Testing

UR - http://www.scopus.com/inward/record.url?scp=85073159251&partnerID=8YFLogxK

U2 - 10.1145/3335082.3335094

DO - 10.1145/3335082.3335094

M3 - Conference contribution

SN - 978-1-4503-7166-7

SP - 207

EP - 214

BT - ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics

ER -

Holmes W, Moorhead A, Bond RR, Zheng H, Coates V, McTear M. Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces? In ECCE 2019 Proceedings of the 31st European Conference on Cognitive Ergonomics: ''Design for Cognition''. 2019. p. 207-214 https://doi.org/10.1145/3335082.3335094