WEDNESDAY, Oct. 4, 2023 (HealthDay News) — Chatbots generate mostly accurate information to physician-developed medical queries, according to a study published online Oct. 2 in JAMA Network Open.
Rachel S. Goodman, from the Vanderbilt University School of Medicine in Nashville, Tennessee, and colleagues examined the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries. A total of 33 physicians across 17 specialties generated 284 questions that were classified as easy, medium, or hard and had binary (yes/no) or descriptive answers. The chatbot-generated answers were graded for accuracy (6-point Likert scale) and completeness (3-point Likert scale).
The researchers found that the median accuracy score was 5.5 across all questions (between almost completely and completely correct), with a mean score of 4.8 (between mostly and almost completely correct). The median and mean completeness scores were both 2.5 (complete and comprehensive). The median accuracy scores were 6.0, 5.5, and 5.0, respectively, for questions rated as easy, medium, and hard (mean scores, 5.0, 4.7, and 4.6, respectively). For binary and descriptive questions, accuracy scores were similar (median, 6.0 versus 5.0, respectively; mean, 4.9 versus 4.7, respectively). Thirty-four of 36 questions with scores of 1.0 to 2.0 were requeried or regraded eight to 17 days later, with considerable improvement noted (median score, 2.0 to 4.0).
“While the chatbot-generated answers displayed high accuracy and completeness scores across various specialties, question types, and difficulty levels in this cross-sectional study, further development is needed to improve the reliability and robustness of these tools before clinical integration,” the authors write.
Several authors disclosed ties to the biopharmaceutical industry.
Copyright © 2023 HealthDay. All rights reserved.