CLINICAL RESEARCH

 

Evaluation of the Schatzker-Kfuri Classification of Tibial Plateau Fractures Using Radiographs and Computed Tomography: Comparison Between an Expert Observer and the ChatGPT-4o Model

 

Héctor A. Rivadeneira Jurado,* Elias A. Rivadeneira Jurado,* Daniel Espinoza Freire,* Andrés F. Samaniego,* Ezequiel Lulkin,* Fernando Bidolegui,** Sebastián Pereira*

*Orthopedics and Traumatology Service, Hospital Sirio-Libanés, Autonomous City of Buenos Aires, Argentina

**Orthopedics and Traumatology Service, Sanatorio Otamendi Miroli, Autonomous City of Buenos Aires, Argentina

 

ABSTRACT

Introduction: Artificial intelligence was formally introduced in 1956, and since then, platforms trained on large datasets have been developed to generate increasingly accurate outputs. The Kfuri-Schatzker classification of tibial plateau fractures enables more precise analysis, particularly when CT imaging is integrated. This study compared the diagnostic accuracy of the ChatGPT-4o model with that of expert evaluators. Materials and Methods: A retrospective observational study was conducted to compare the interpretations of an expert observer with those generated by ChatGPT-4o. A dataset of 45 expert-published case reports including radiographs and CT scans from databases such as PubMed, Elsevier, and SciELO was used to refine the prompt guiding ChatGPT-4o’s analysis. Six additional case reports of tibial plateau fractures, none previously provided to the model, were selected for evaluation. ChatGPT-4o analyzed each case and proposed a classification according to the Schatzker-Kfuri system. Its responses were compared with the expert diagnoses reported in the literature. Results: ChatGPT-4o correctly classified all the cases analyzed. In bicondylar fractures, the model accurately identified components of subsidence, shear (split) pattern, and epiphyseal-diaphyseal dissociation. Cohen’s kappa coefficient was 1.00, indicating perfect agreement. Conclusion: The ChatGPT-4o model demonstrated high diagnostic accuracy in classifying tibial plateau fractures using the Schatzker-Kfuri system, achieving agreement comparable to that of an expert evaluator.

Keywords: Artificial intelligence; tibial plateau; Schatzker-Kfuri classification.

Level of Evidence: III

 

Evaluación de la clasificación de las fracturas de platillo tibial según Schatzker-Kfuri utilizando radiografías y tomografía. Comparación entre el observador experto y el modelo ChatGPT-4o

 

RESUMEN

Introducción: La inteligencia artificial fue presentada formalmente en 1956, luego, se crearon plataformas con un conjunto de información para obtener el resultado apropiado. La clasificación de fracturas de platillo tibial de Kfuri y Schatzker permite hacer un análisis más preciso, especialmente al integrar cortes tomográficos. En este estudio, se comparó la capacidad diagnóstica del modelo ChatGPT-4o con la evaluación del panel de expertos. Materiales y Métodos: Estudio retrospectivo, observacional para comparar la interpretación del observador experto y la del ChatGPT-4o. Se recopilaron 45 reportes de caso publicados por expertos con radiografías y tomografías, en distintas bases de datos, como PubMed, Elsevier, SciELO, que se usaron para mejorar el análisis del ChatGPT-4o. Se seleccionaron 6 reportes de caso de fractura de platillo tibial, que no se habían cargado previamente en la plataforma para analizar la interpretación del ChatGPT-4o en base al Prompt creado antes. El modelo ChatGPT-4o analizó cada uno de los casos y propuso una clasificación basada en el sistema de Schatzker-Kfuri. Las respuestas fueron contrastadas con la información obtenida de reportes de casos. Resultados: El ChatGPT-4o clasificó correctamente los casos analizados. Los componentes de hundimiento, trazo de cizallamiento (split) y disociación epifisodiafisaria fueron identificados, con precisión, en los casos bicondilares. Asimismo, se utilizaron medidas de concordancia kappa de Cohen: 1.00, lo cual se interpreta como concordancia perfecta. Conclusión: El ChatGPT-4o tuvo una alta capacidad diagnóstica en la clasificación de fracturas de platillo tibial según Schatzker-Kfuri, equiparable a la de un experto.

Palabras clave: Inteligencia artificial; platillo tibial; clasificación de Schatzker-Kfuri.

Nivel de Evidencia: III I

 

INTRODUCTION

Artificial intelligence (AI) was formally introduced in 1956.1 Over the years, increasingly sophisticated computer programs have been developed for use in various fields, including orthopedics and traumatology. However, current platforms require an appropriate prompt or set of information to produce accurate outputs.2

In traumatology and orthopedics, tibial plateau fractures represent a significant diagnostic and therapeutic challenge. The Schatzker classification, created in 1979 and widely used in orthopedic practice, categorizes fractures of the tibial plateau. More recently, a three-dimensional evaluation model based on computed tomography (CT) was developed to better define the anatomical involvement of plateau quadrants, giving rise to the Schatzker– Kfuri classification in 2018.3 This system allows differentiation between unicondylar, bicondylar, and epiphyseal-diaphyseal dissociation fractures and has improved surgical planning.

The objective of this study was to compare the ability of the multimodal language model ChatGPT-4o to classify tibial plateau fractures using both radiographs and CT images with that of expert case reports published in the literature.

 

MATERIALS AND METHODS

A retrospective, observational study was conducted to compare the interpretation of an expert observer with that of ChatGPT-4o. To create the prompt, 45 case reports published in databases such as PubMed, Elsevier, and SciELO were included. These reports contained anteroposterior and lateral knee radiographs and axial, coronal, and sagittal CT scans of the knee. Case reports with incomplete CT series or without complete radiographs were excluded. Thus, the 45 expert-validated case reports with radiographs and CT scans were used to improve the accuracy of ChatGPT-4o’s interpretive performance. Prior to uploading, the images were organized in the following order: anteroposterior knee radiograph, lateral radiograph, and axial, coronal, and sagittal CT slices of the tibial plateau (Figure).

 

 

 

 

 

Additional content was incorporated into the prompt, including anatomical descriptions, basic traumatology concepts, examples of split or shear fractures, depression patterns, combined fracture mechanisms, and fractures with epiphyseal-diaphyseal extension.

Descriptive information and corresponding illustrations were progressively uploaded until the AI model prompt was complete. Subsequently, 45 case reports in DICOM format were uploaded to further refine the model’s interpretive capability. Finally, six expert case reports not previously included were used for the evaluation phase. Each of the six reports represented a different fracture pattern included in the classification system.

ChatGPT-4o sequentially analyzed each image set and proposed a classification based on the Schatzker–Kfuri system. The proposed classification was recorded and compared with the reference classification documented in the original case reports. A classification was considered correct when it matched the expert-reported classification exactly.

 

RESULTS

All six cases were correctly classified by the model. The following fracture patterns were accurately identified:

Pure depression (type III)

Lateral split/shear (type I)

Bicondylar fracture without dissociation (type V)

Epiphysiodiaphyseal dissociation (type VI)

Medial column involvement (type IV)

Depression with lateral split (type II)

A summary of the comparison is shown in the Table.

The analysis of the six expert-described case reports compared with the ChatGPT-4o interpretation demonstrated complete agreement based on both radiographic and three-dimensional assessment. Importantly, the study yielded a Cohen’s kappa coefficient of 1.00, which is interpreted as perfect agreement.

 

 

 

 

 

 

DISCUSSION

The results of this study are consistent with recent publications demonstrating the growing potential of artificial intelligence in the diagnosis of articular fractures. In particular, studies by Mohammadi et al.4 and Van der Gaast et al.5 have shown that models such as ChatGPT-4o can achieve accuracy levels comparable to those of specialist radiologists in interpreting radiographs. Similar findings have been reported in resource-limited settings, where the addition of three-dimensional CT reconstruction significantly improved diagnostic interpretation, as described by Markhardt et al.6 Furthermore, recent reviews on AI in orthopedic surgery emphasize the need for comparative studies against expert clinicians to establish the validity of AI-based image interpretation. Gyftopoulos et al.,7 and Kuo et al.8 evaluated the predictive performance of deep-learning models for classifying tibial plateau fractures and demonstrated promising applicability in real clinical scenarios. Additional contributions from Giordano et al.,9 Singh Sidhu et al.,10 Cai et al.,11 Liu et al.,12 Martinez and Cayon,13 and De Cicco et al.14 offer complementary evidence regarding surgical approaches, associated fracture patterns, and functional prognosis that could eventually be incorporated into automated models for classification and therapeutic planning.

Kuo et al.8 also reported that AI performance exhibits a sensitivity and specificity approximately 3% lower than that of physicians, although the differences were not statistically significant. Likewise, Alenazi et al.15 highlight that AI can be a valuable adjunct to clinical judgment, particularly in environments with limited specialist availability.

The Schatzker–Kfuri classification represents a greater interpretative challenge than traditional radiographic systems, as it incorporates tomographic and three-dimensional information. Despite this, the model in our study was able to identify fracture patterns in each quadrant with high precision and to correctly recognize the presence or absence of metaphyseal-diaphyseal dissociation.

Overall, our findings demonstrate that, when provided with appropriate visual guidance and a structured analytical approach, multimodal language models can serve as useful adjuncts in orthopedic education and diagnostic support in traumatology.

 

CONCLUSIONS

The ChatGPT-4o model correctly classified all six cases of tibial plateau fractures according to the three-dimensional Schatzker-Kfuri classification, achieving complete agreement with an expert observer. This paves the way for the use of AI as a tool for clinical decision support, particularly in training or diagnostic validation settings.

 

REFERENCES

 

1.     Lhotská L. Umělá inteligence v medicíně a zdravotnictví: Příležitost a/nebo hrozba? Čas Lék Čes 2023;162(7-8):275-8. Available at: https://www.prolekare.cz/casopisy/casopis-lekaru-ceskych/2023-7-8-1/umela-inteligence-v-medicine-a-zdravotnictvi-prilezitost-a-nebo-hrozba-136669

2.     Mucci T. La historia de la inteligencia artificial. IBM Think 2019 [citado 2025 nov 21]. Available at: https://www.ibm.com/es-es/think/topics/history-of-artificial-intelligence

3.     Kfuri M, Schatzker J. Revisiting the Schatzker classification of tibial plateau fractures. Injury 2018;49(12):2252-63. https://doi.org/10.1016/j.injury.2018.07.010

4.     Mohammadi M, Parviz S, Parvaz P, Pirmoradi MM, Afzalimoghaddam M, Mirfazaelian H. Diagnostic performance of ChatGPT in tibial plateau fracture in knee X-ray. Emerg Radiol 2025;32(1):59-64. https://doi.org/10.1007/s10140-024-02298-y

5.     Van der Gaast N, Bagave P, Assink N, Broos S, Jaarsma RL, Edwards MJR, et al. Deep learning for tibial plateau fracture detection and classification. Knee 2025;54:81-9. https://doi.org/10.1016/j.knee.2025.02.001

6.     Markhardt B, Gross JM, Monu J. Schatzker classification of tibial plateau fractures:  Use of  CT and MR imaging improves assessment. Radiographics 2009;29(2):585-97. https://doi.org/10.1148/rg.292085078

7.     Gyftopoulos S, Lin D, Knoll F, Doshi AM, Cantarelli Rodrigues T, Recht MP. Artificial intelligence in musculoskeletal imaging: current status and future directions. AJR Am J Roentgenol 2019;213(3):506-13. https://doi.org/10.2214/AJR.19.21117

8.     Kuo R, Harrison C, Curran T, Jones B, Freethy A, Cussons D, et al. Artificial intelligence in fracture detection: A systematic review and meta-Analysis. Radiology 2022;304(1):50-62. https://doi.org/10.1148/radiol.211785

9.     Giordano V, Schatzker J, Kfuri M. The ‘Hoop’ plate for posterior bicondylar shear tibial plateau fractures: Description of a new surgical technique. J Knee Surg 2022;35(2):123-9. https://doi.org/10.1055/s-0036-1593366

10.  Singh Sidhu GA, Hind J, Ashwood N, Kaur H, Bridgwater H, Rajagopalan S. A systematic review of current approaches to tibial plateau, Cureus 2022;14(7):e27183. https://doi.org/10.7759/cureus.27183

11.  Cai D, Zhou Y, He W, Yuan J, Liu C, Li R, et al. Automatic segmentation of knee CT images of tibial plateau fractures based on three-dimensional U-Net: assisting junior physicians with Schatzker classification. Eur J Radiol 2024;178:111605. https://doi.org/10.1016/j.ejrad.2024.111605

12.  Liu Y, Fang R, Tu B, Zhu Z, Zhang C, Ning R. Correlation of preoperative CT imaging shift parameters of the lateral plateau with lateral meniscal injury in Schatzker IV-C tibial plateau fractures. BMC Musculoskelet Disord 2023;24(1):793. https://doi.org/10.1186/s12891-023-06924-7

13.  Martinez A, Cayon M. Fracturas del platillo tibial posterior. Revista Colombiana de Cirugía Ortopédica y Traumatología 1999;13(1):37-1. Disponible en: https://sccot.org/pdf/RevistaDigital/1999/Vol13N1/37-41.pdf

14.  De Cicco F, Verbner J, Abrego M, Taype D, Carabelli G, Barla J, et al. Soporte circunferencial posterior en fracturas de platillo tibial. Rev Asoc Argent Ortop Traumatol 2021;86(2):219-27. https://doi.org/10.15417/issn.1852-7434.2021.86.2.1018

15.  Alenazi HK, Alahmari RA, Mubarak Hassan Al Faraj A, Nasser Almurkan M, Saleh Al Hashel IM, Al Hagwi AI, et al. The future of artificial intelligence in X-ray radiography: Enhancing healthcare and workflow efficiency. J Int Crisis Risk Commun Res 2024;7(53):51-3. https://doi.org/10.63278/jicrcr.vi.708

 

 

E. Rivadeneira ORCID ID: https://orcid.org/0009-0006-5784-5700    

E. Lulkin ORCID ID: https://orcid.org/0000-0002-4119-0483

D. E. Freire ORCID ID: https://orcid.org/0009-0000-9882-6027         

F. Bidolegui ORCID ID: https://orcid.org/0000-0002-0502-2300

A. F. Samaniego ORCID ID: https://orcid.org/0000-0002-6616-6471

S. Pereira ORCID ID: https://orcid.org/0000-0001-9475-3158

 

Received on September 5th, 2025. Accepted after evaluation on November 21st, 2025 Dr. Héctor A. Rivadeneira Jurado 1bhribadeneirajurado@gmail.com https://orcid.org/0009-0008-6397-9718

 

How to cite this article: Rivadeneira Jurado HA, Rivadeneira Jurado EA,Espinoza Freire DE, Samaniego AF, Lulkin E, Bidolegui F, Pereira S. Evaluation of the Schatzker-Kfuri Classification of Tibial Plateau Fractures Using Radiographs and Computed Tomography: Comparison Between an Expert Observer and the ChatGPT-4o Model. Rev Asoc Argent Ortop Traumatol 2025;90(6):556-560. https://doi.org/10.15417/issn.1852-7434.2025.90.6.2224

 

 

Article Info

Identification: https://doi.org/10.15417/issn.1852-7434.2025.90.6.2224

Published: December, 2025

Conflict of interests: The authors declare no conflicts of interest.

Copyright: © 2025, Revista de la Asociación Argentina de Ortopedia y Traumatología.

License: This article is under Attribution-NonCommertial-ShareAlike 4.0 International Creative Commons License (CC-BY-NC-SA 4.0).