Figures (3)  Tables (3)
    • Figure 1. 

      Study design and workflow. The workflow comprised three steps: (a) Medical documentation and specialist standards used to construct the MH participant record and define the gold standard, (b) response generation and evaluation by three AI chatbots and an ophthalmology resident, and (c) outcome analyses, including diagnostic agreement, treatment suggestion agreement, and GQS. Abbreviations: MH, macular hole; OCT, optical coherence tomography; GQS, Global Quality Score.

    • Figure 2. 

      Diagnosis and treatment suggestion agreement for macular hole across ChatGPT-o3, Gemini 2.5 Pro, DeepSeek-R1, and an ophthalmology resident. (a) Diagnosis agreement. (b) Treatment suggestion agreement. Bars indicate the agreement rate, with exact Clopper–Pearson 95% confidence intervals (n = 50). Pairwise comparisons were performed using paired chi-square tests. ** p < 0.01; ns, not significant. Abbreviations: CI, confidence interval.

    • Figure 3. 

      Global quality score across ChatGPT-o3, Gemini 2.5 Pro, DeepSeek-R1, and an ophthalmology resident. (a) Grader 1. (b) Grader 2. Bars show mean ± SD. Brackets indicate pairwise comparisons based on generalized estimating equations. *p < 0.05, ** p < 0.01, *** p < 0.001; ns, not significant. Abbreviations: GQS, Global Quality Score; SD, standard deviation.

    • ScoreOverall description
      1Poor quality, poor flow of the site, most information missing, not at all useful for patients
      2Generally poor quality and poor flow, some information listed but many important topics missing, of very limited use to patients
      3Moderate quality, suboptimal flow, some important information is adequately discussed but others poorly discussed, somewhat useful for patients
      4Good quality and generally good flow, most of the relevant information is listed, but some topics not covered, useful for patients
      5Excellent quality and excellent flow, very useful for patients

      Table 1. 

      Global quality score description.

    • Variable Value n (%)
      Number of participants 50 N/A
      Age (years) 59.5 ± 9.9 N/A
      Sex (male/female) 14/36 28/72
      Eye laterality (right/left) 20/30 40/60
      Macular hole phenotype
      LMH 5 10
      FTMH 37 74
      MH-RRD 8 16
      Gass stage (FTMH only, n = 37)
      Stage II 9 24
      Stage III 3 8
      Stage IV 25 68
      Tamponade in reference plana
      Gas 44 88
      Silicone oil 4 8
      None 2 4
      Ocular comorbiditiesb
      Cataract 8 16
      High myopia 6 12
      Epiretinal membrane 5 10
      Others 3 6
      a Percentages use the cohort size as denominator. b Comorbidities are not mutually exclusive. Abbreviations: LMH, lamellar macular hole; FTMH, full-thickness macular hole; MH-RRD, macular hole with rhegmatogenous retinal detachment.

      Table 2. 

      Baseline clinical characteristics of participants with macular hole.

    • Evaluator Diagnosis (95% CI) Treatment (95% CI) GQS
      Agreement (%) p Agreement (%) p Grader 1 Grader 2
      ChatGPT-o3 0.86 (73.3-94.2) N/A 0.92 (80.8-97.8) N/A 3.78 ± 0.65 3.78 ± 1.00
      Gemini 2.5 Pro 0.80 (66.3-89.9) 0.248 0.80 (66.3-89.9) 0.077 4.02 ± 0.43 3.88 ± 1.00
      DeepSeek-R1 0.82 (68.6-91.4) 0.617 0.86 (73.3-94.2) 0.248 3.18 ± 1.26 3.14 ± 1.32
      Resident 0.82 (68.6-91.4) 0.803 0.70 (55.4-82.1) 0.006 3.70 ± 0.65 3.50 ± 1.02
      CI, confidence interval; GQS, Global Quality Score. Agreement p-values are from the paired chi-square test versus ChatGPT-o3. Both masked graders had ten years of ophthalmology clinical experience.

      Table 3. 

      Macular hole diagnosis and treatment suggestion agreement and global quality score.