Photo Credit: Artemis Diana
The following is a summary of “Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases,” published in the March 2024 issue of Ophthalmology by Carlà et al.
Researchers conducted a retrospective study to assess the proficiency of three publicly accessible large language models, Chat Generative Pretrained Transformer (ChatGPT-3.5), ChatGPT-4, and Google Gemini, to analyze retinal detachment cases and recommend optimal surgical strategies.
They analyzed 54 records of retinal detachments inputted into the interfaces of ChatGPT and Gemini. Following the question, “Specify the surgical planning you would suggest and the eventual intraocular tamponade,” collected the responses and evaluated the level of agreement with the consensus of three expert vitreoretinal surgeons. The answers provided by ChatGPT and Gemini were graded on a scale of 1–5 (poor to excellent quality) using the Global Quality Score (GQS).
The results showed that after excluding 4 controversial cases, 50 cases were included. ChatGPT-3.5, ChatGPT-4, and Google Gemini surgical choices agreed with those of vitreoretinal surgeons in 40/50 (80%), 42/50 (84%), and 35/50 (70%) of cases. Google Gemini was unable to respond in five cases. Contingency analysis revealed significant differences between ChatGPT-4 and Gemini (P=0.03). ChatGPT’s GQS were 3.9±0.8 and 4.2±0.7 for versions 3.5 and 4, while Gemini scored 3.5±1.1. There was no statistical difference between the two ChatGPTs (P=0.22), while both outperformed Gemini scores (P=0.03 and P=0.002). The primary source of error was endotamponade choice (14% for ChatGPT-3.5 and 4, and 12% for Google Gemini). Only ChatGPT-4 was able to suggest a combined phacovitrectomy approach.
Investigators concluded that, while both AI models analyzed vitreoretinal data well, ChatGPT’s recommendations were demonstrably more accurate.
Source: bjo.bmj.com/content/early/2024/03/06/bjo-2023-325143