European Diploma in Breast Imaging Sample Multiple Choice Questions: Evaluating the Capabilities of the Large Language Models

Muhammed Said Beşler

Letter to Editor

European Diploma in Breast Imaging Sample Multiple Choice Questions: Evaluating the Capabilities of the Large Language Models

Year 2024, Volume: 4 Issue: 1, 717 - 718, 25.04.2024

Muhammed Said Beşler

Abstract

In pursuit of a fresh outlook, my objective was to gauge the abilities of a multimodal large language model (LLM) against sample questions from the European Diploma in Breast Imaging (EDBI) test, an initiative by the European Society of Breast Imaging.
Large language models are pushing the potential in radiology, from interpreting text and medical images to generating reports (Bhayana, 2024). Generative Pre-trained Transformer 4 (GPT-4) has notably passed a national mammography board exam with clarity (Almeida et al., 2024). As the latest version among multimodal LLM types, GPT-4 is capable of answering questions requiring both lower-order and higher-order thinking. Three written sample questions, where multiple choices could be correct, were evaluated. It was noted that there was no negative marking for incorrect answers (https://www.eusobi.org/european-diploma-in-breast-imaging-edbi/). The scoring system was adapted from the European Diploma in Radiology scoring guidelines (https://www.myebr.org/edir-scoring-faqs). Data was obtained from Google Gemini, GPT-3.5, and GPT-4 in March 2024. When assigning a value of 1 point to each question, GPT-4 reached an accuracy of 78%, GPT-3.5 achieved 50%, and Google Gemini scored 22.2%. This notable success in the sample questions from the EDBI particularly emphasizes GPT-4's potential in aiding clinical decision-making. Future studies may assess its performance in questions requiring medical image analysis, such as mammography, breast ultrasound, or breast magnetic resonance imaging.

Keywords

Artificial intelligence, Medical imaging, Breast diseases

References

Almeida, L.C., Farina, E. M. J. M., Kuriki, P. E. A., Abdala, N., & Kitamura, F. C (2024). Performance of ChatGPT on the Brazilian Radiology and Diagnostic Imaging and Mammography Board Examinations. Radiol Artif Intell, 6(1), e230103. http://dx.doi.org/10.1148/ryai.230103
Bhayana, R. (2024). Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology, 310(1), e232756. http://dx.doi.org/10.1148/radiol.232756

Avrupa Meme Görüntüleme Diploması Çoktan Seçmeli Örnek Soruları: Büyük Dil Modellerinin Yeteneklerinin Değerlendirilmesi

Year 2024, Volume: 4 Issue: 1, 717 - 718, 25.04.2024

Muhammed Said Beşler

Abstract

Yeni bir bakış açısı arayışıyla amacım Avrupa Meme Görüntüleme Derneği tarafından yürütülen Avrupa Meme Görüntüleme Diploması (EDBI) testi örnek soruları üzerinden multimodal büyük dil modellerinin (LLM) yeteneklerini ölçmektir. Büyük dil modelleri, metin ve tıbbi görüntülerin yorumlanmasından raporların üretilmesine kadar radyolojide önemli bir potansiyeli zorlamaktadır (Bhayana, 2024). Generative Pre-trained Transformer 4 (GPT-4), özellikle bir ulusal mamografi kurulu sınavını açık bir şekilde geçmiştir (Almeida vd., 2024). Multimodal LLM türleri arasında en son versiyon olan GPT-4, hem düşük düzeyde hem de yüksek düzeyde düşünme gerektiren soruları cevaplayabilme kapasitesine sahiptir. Birden fazla doğru cevabı olabilecek üç yazılı örnek soru değerlendirildi. Yanlış cevaplar için negatif puanlama olmadığı belirtilmiştir (https://www.eusobi.org/european-diploma-in-breast-imaging-edbi/). Puanlama sistemi, Avrupa Radyoloji Diploması puanlama kılavuzlarından uyarlandı (https://www.myebr.org/edir-scoring-faqs). Veriler Mart 2024'te Google Gemini, GPT-3.5 ve GPT-4'ten elde edildi. Her soruya 1 puan değeri atanırken, GPT-4 %78 doğruluk oranına ulaştı, GPT-3.5 %50 ve Google Gemini %22,2 başarı gösterdi. EDBI örnek sorularındaki bu dikkate değer başarı, özellikle GPT-4'ün klinik karar verme sürecinde yardımcı olma potansiyelini vurgulamaktadır. Gelecek çalışmalar, mamografi, meme ultrasonu veya meme manyetik rezonans görüntüleme gibi tıbbi görüntü analizi gerektiren sorularda performansını değerlendirebilir.

Keywords

Yapay zeka, Tıbbi görüntüleme, Meme hastalıkları

References

Almeida, L.C., Farina, E. M. J. M., Kuriki, P. E. A., Abdala, N., & Kitamura, F. C (2024). Performance of ChatGPT on the Brazilian Radiology and Diagnostic Imaging and Mammography Board Examinations. Radiol Artif Intell, 6(1), e230103. http://dx.doi.org/10.1148/ryai.230103
Bhayana, R. (2024). Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology, 310(1), e232756. http://dx.doi.org/10.1148/radiol.232756

There are 2 citations in total.

Details

Primary Language	English
Subjects	Health Services and Systems (Other)
Journal Section	Letter to the Editor
Authors	Muhammed Said Beşler 0000-0001-8316-7129
Publication Date	April 25, 2024
Submission Date	April 1, 2024
Acceptance Date	April 21, 2024
Published in Issue	Year 2024 Volume: 4 Issue: 1

Cite

APA	Beşler, M. S. (2024). European Diploma in Breast Imaging Sample Multiple Choice Questions: Evaluating the Capabilities of the Large Language Models. Unika Sağlık Bilimleri Dergisi, 4(1), 717-718.

Download Cover Image

Article Files

Full Text