-
Figure 1.
Study design and workflow. The workflow comprised three steps: (a) Medical documentation and specialist standards used to construct the MH participant record and define the gold standard, (b) response generation and evaluation by three AI chatbots and an ophthalmology resident, and (c) outcome analyses, including diagnostic agreement, treatment suggestion agreement, and GQS. Abbreviations: MH, macular hole; OCT, optical coherence tomography; GQS, Global Quality Score.
-
Figure 2.
Diagnosis and treatment suggestion agreement for macular hole across ChatGPT-o3, Gemini 2.5 Pro, DeepSeek-R1, and an ophthalmology resident. (a) Diagnosis agreement. (b) Treatment suggestion agreement. Bars indicate the agreement rate, with exact Clopper–Pearson 95% confidence intervals (n = 50). Pairwise comparisons were performed using paired chi-square tests. ** p < 0.01; ns, not significant. Abbreviations: CI, confidence interval.
-
Figure 3.
Global quality score across ChatGPT-o3, Gemini 2.5 Pro, DeepSeek-R1, and an ophthalmology resident. (a) Grader 1. (b) Grader 2. Bars show mean ± SD. Brackets indicate pairwise comparisons based on generalized estimating equations. *p < 0.05, ** p < 0.01, *** p < 0.001; ns, not significant. Abbreviations: GQS, Global Quality Score; SD, standard deviation.
-
Score Overall description 1 Poor quality, poor flow of the site, most information missing, not at all useful for patients 2 Generally poor quality and poor flow, some information listed but many important topics missing, of very limited use to patients 3 Moderate quality, suboptimal flow, some important information is adequately discussed but others poorly discussed, somewhat useful for patients 4 Good quality and generally good flow, most of the relevant information is listed, but some topics not covered, useful for patients 5 Excellent quality and excellent flow, very useful for patients Table 1.
Global quality score description.
-
Variable Value n (%) Number of participants 50 N/A Age (years) 59.5 ± 9.9 N/A Sex (male/female) 14/36 28/72 Eye laterality (right/left) 20/30 40/60 Macular hole phenotype LMH 5 10 FTMH 37 74 MH-RRD 8 16 Gass stage (FTMH only, n = 37) Stage II 9 24 Stage III 3 8 Stage IV 25 68 Tamponade in reference plana Gas 44 88 Silicone oil 4 8 None 2 4 Ocular comorbiditiesb Cataract 8 16 High myopia 6 12 Epiretinal membrane 5 10 Others 3 6 a Percentages use the cohort size as denominator. b Comorbidities are not mutually exclusive. Abbreviations: LMH, lamellar macular hole; FTMH, full-thickness macular hole; MH-RRD, macular hole with rhegmatogenous retinal detachment. Table 2.
Baseline clinical characteristics of participants with macular hole.
-
Evaluator Diagnosis (95% CI) Treatment (95% CI) GQS Agreement (%) p Agreement (%) p Grader 1 Grader 2 ChatGPT-o3 0.86 (73.3-94.2) N/A 0.92 (80.8-97.8) N/A 3.78 ± 0.65 3.78 ± 1.00 Gemini 2.5 Pro 0.80 (66.3-89.9) 0.248 0.80 (66.3-89.9) 0.077 4.02 ± 0.43 3.88 ± 1.00 DeepSeek-R1 0.82 (68.6-91.4) 0.617 0.86 (73.3-94.2) 0.248 3.18 ± 1.26 3.14 ± 1.32 Resident 0.82 (68.6-91.4) 0.803 0.70 (55.4-82.1) 0.006 3.70 ± 0.65 3.50 ± 1.02 CI, confidence interval; GQS, Global Quality Score. Agreement p-values are from the paired chi-square test versus ChatGPT-o3. Both masked graders had ten years of ophthalmology clinical experience. Table 3.
Macular hole diagnosis and treatment suggestion agreement and global quality score.
Figures
(3)
Tables
(3)