Vol. 5 No. 1 (2025): Issue 5
Articles

Retrieval-augmented generation for personalized physician recommendations in online medical services: model development study

Yingbin Zheng
Biomedical Big Data Center, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China
Yiwei Yan
Biomedical Big Data Center, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China
Sai Chen
Meteorological Disaster Prevention Technology Center, Xiamen Meteorological Bureau, Xiamen, China
Yunping Cai
Meteorological Disaster Prevention Technology Center, Xiamen Meteorological Bureau, Xiamen, China
Kun Ren
Meteorological Disaster Prevention Technology Center, Xiamen Meteorological Bureau, Xiamen, China
Yishan Liu
School of Software Engineering, Taiyuan University of Technology, Taiyuan, China
Jiaying Zhuang
School of Software Engineering, Taiyuan University of Technology, Taiyuan, China
Min Zhao
School of Software Engineering, Taiyuan University of Technology, Taiyuan, China

Published 2025-03-05

Keywords

  • large language models,
  • mistral, SBERT,
  • triage systems,
  • retrievalaugmented generation-based physician recommendation,
  • RAGPR model

How to Cite

Zheng, Y., Yan, Y., Chen, S., Cai, Y., Ren, K., Liu, Y., … Zhao, M. (2025). Retrieval-augmented generation for personalized physician recommendations in online medical services: model development study. Optimizations in Applied Machine Learning, 5(1). https://doi.org/10.71070/oaml.v5i1.141

Abstract

Web-based medical services have expanded access to healthcare through remote consultations and streamlined scheduling, but personalized physician recommendations remain limited due to reliance on manual triage. This study developed and validated a Retrieval-Augmented Generation-Based Physician Recommendation (RAGPR) model to enhance triage performance. Using 646,383 consultation records from the Internet Hospital of the First Affiliated Hospital of Xiamen University, we evaluated embedding models (FastText, SBERT, OpenAI) for clustering and classification, as well as large language models (Mistral, GPT-4o-mini, GPT-4o). Three triage staff also assessed model efficiency via questionnaires. Results showed that FastText performed poorly (F1-score 46%), while SBERT and OpenAI achieved 95% and 96%. Among LLMs, GPT-4o reached the highest F1-score (95%) with a performance rating of 4.67, followed by Mistral (94%, 4.56) and GPT-4o-mini (92%, 4.45). Considering accuracy, cost, and implementation, SBERT and Mistral were optimal. The RAGPR model offers a scalable approach to improving accuracy and personalization in online patient– physician matching.

References

  1. Nordstrand AE, Anyan F, Bøe H, Hjemdal O, Noll L, Reichelt J, et al. Problematic anger among military personnel after combat deployment: prevalence and risk factors. BMC Psychol. (2024) 12:451. doi: 10.1186/s40359-024-01955-8
  2. Sumar K, Blue L, Fatahi G, Sumar M, Alvarez S, Cons P, et al. The effect of adding physician recommendation in digitally-enabled outreach for COVID-19 vaccination in socially/economically disadvantaged populations. BMC Public Health. (2024) 24:1933. doi: 10.1186/s12889-024-18648-x
  3. Brindisino F, Girardi G, Crestani M, et al. Rehabilitation in subjects with frozen shoulder: a survey of current (2023) clinical practice of Italian physiotherapists. BMC Musculoskelet Disord. (2024) 25:573. doi: 10.1186/s12891-024-07682-w
  4. Rui JR, Guo J, Yang K. How do provider communication strategies predict online patient satisfaction? A content analysis of online patient-provider communication transcripts. Digit Health. (2024) 10:20552076241255617. doi: 10.1177/20552076241255617
  5. Wetzel AJ, Koch R, Koch N, Klemmt M, Müller R, Preiser C, et al. 'Better see a doctor?' status quo of symptom checker apps in Germany: a cross-sectional survey with a mixed-methods design (CHECK.APP). Health. (2024) 10:20552076241231555. doi: 10.1177/20552076241231555
  6. Iranzad R, Liu X, Dese K, Alkhadrawi H, Snoderly H, Bennewitz M. Structured adaptive boosting trees for detection of multicellular aggregates in fluorescence intravital microscopy. Microvasc Res. (2024) 156:104732:104732. doi: 10.1016/j.mvr.2024.104732
  7. Herr K, Lu P, Diamreyan K, Xu H, Mendonca E, Weaver KN, et al. Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital. HGG Adv. (2024) 5:100341. doi: 10.1016/j.xhgg.2024.100341
  8. Lilli L, Bosello SL, Antenucci L, Patarnello S, Ortolan A, Lenkowicz J, et al. A comprehensive natural language processing pipeline for the chronic lupus disease. Stud Health Technol Inform. (2024) 316:909–13. doi: 10.3233/SHTI240559
  9. Bonomo M, Rombo SE. Neighborhood based computational approaches for the prediction of lncRNA-disease associations. BMC Bioinformatics. (2024) 25:187. doi: 10.1186/s12859-024-05777-8
  10. Chew LJ, Haw SC, Subramaniam S. A hybrid recommender system based on data enrichment on the ontology modelling. F1000Res. (2021) 10:937. doi: 10.12688/ f1000research.73060.1
  11. Abdullahi T, Mercurio L, Singh R, Eickhoff C. Retrieval-based diagnostic decision support: mixed methods study. JMIR Med Inform. (2024) 12:e50209. doi: 10.2196/50209
  12. Yazaki M, Maki S, Furuya T, Inoue K, Nagai K, Nagashima Y, et al. Emergency patient triage improvement through a retrieval-augmented generation enhanced large- scale language model. Prehosp Emerg Care. (2024) 400:1–7. doi: 10.1080/10903127.2024.2374400
  13. Gargari OK, Fatehi F, Mohammadi I, Firouzabadi S, Shafiee A, Habibi G. Diagnostic accuracy of large language models in psychiatry. Asian J Psychiatr. (2024) 100:104168. doi: 10.1016/j.ajp.2024.104168
  14. Arun G, Perumal V, Urias F, Ler Y, Tan B, Vallabhajosyula R, et al. ChatGPT versus a customized AI chatbot (Anatbuddy) for anatomy education: a comparative pilot study. Anat Sci Educ. (2024) 17:1396–405. doi: 10.1002/ase.2502
  15. Tabaie A, Tran A, Calabria T, Bennett S, Milicia A, Weintraub W, et al. Evaluation of a natural language processing approach to identify diagnostic errors and analysis of safety learning system case review data: retrospective cohort study. J Med Internet Res. (2024) 26:e50935. doi: 10.2196/50935
  16. Sharif S, Ghouchan R, Abbassian H, Eslami S. Comparison of regression methods to predict the first spike latency in response to an external stimulus in intracellular recordings for cerebellar cells. Stud Health Technol Inform. (2024) 316:796–800. doi: 10.3233/SHTI240531
  17. Santander-Cruz Y, Salazar-Colores S, Paredes-Garcia WJ, et al. Semantic feature extraction using SBERT for dementia detection. Brain Sci. (2022) 12:270. doi: 10.3390/ brainsci12020270
  18. Izzidien A, Fitz S, Romero P, et al. Developing a sentence level fairness metric using word embeddings. Int J Digit Humanit. (2022) 10:1–36. doi: 10.1007/ s42803-022-00049-4
  19. Oh J, Park H. Effects of changes in environmental color Chroma on heart rate variability and stress by gender. Int J Environ Res Public Health. (2022) 19:711. doi: 10.3390/ijerph19095711
  20. Santana EFM, Araujo JE. Realistic Vue: a new three-dimensional surface rendering approach for the in utero visualization of embryos and fetuses. Radiol Bras. (2019) 52:172–3. doi: 10.1590/0100-3984.2018.0050
  21. Jolley KA, Bray JE, Maiden MCJ. A RESTful application programming interface for the PubMLST molecular typing and genome databases. Database. (2017) 2017:60. doi: 10.1093/database/bax060
  22. Wang H, Gao C, Dantona C, Hull B, Sun J. DRG-LLaMA: tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med. (2024) 7:16. doi: 10.1038/s41746-023-00989-3
  23. Tai ICY, Wong ECK, Wu JT, et al. Exploring offiine large language models for clinical information extraction: a study of renal histopathological reports of lupus nephritis patients. Stud Health Technol Inform. (2024) 316:899–903. doi: 10.3233/ SHTI240557
  24. Endalie D, Haile G, Taye W. Deep learning-based idiomatic expression recognition for the Amharic language. PLoS One. (2023) 18:e0295339. doi: 10.1371/journal. pone.0295339
  25. Saito Y, Itakura K, Ohtake N, et al. Classification of soybean chemical characteristics by excitation emission matrix coupled with t-SNE dimensionality reduction. Spectrochim Acta A Mol Biomol Spectrosc. (2024) 322:124785. doi: 10.1016/j. saa.2024.124785
  26. Clements F, Vedam H, Chung Y, et al. Patient preference of level I, II and III sleep diagnostic tests to diagnose obstructive sleep apnoea among pregnant women in early to mid-gestation. Sleep Breath. (2024) 28:2387–95. doi: 10.1007/s11325-024-03114-0
  27. Shu D, Zou G. Sample size planning for estimating the global win probability with precision and assurance. Contemp Clin Trials. (2024) 146:107665. doi: 10.1016/j. cct.2024.107665
  28. Muayad J, Loya A, Hussain ZS, Chauhan M, Alsoudi A, de T, et al. Comparative effects of glucagon-like peptide 1 receptor agonists and metformin on glaucoma risk in patients with type 2 diabetes. Ophthalmology. (2024) 23:S0161–6420. doi: 10.1016/j. ophtha.2024.08.023
  29. Bertò G, Rooks LT, Broglio SP, McAllister T, McCrea M, Pasquina P, et al. Diffusion tensor analysis of white matter tracts is prognostic of persisting post-concussion symptoms in collegiate athletes. Neuroimage Clin. (2024) 43:103646:103646. doi: 10.1016/j.nicl.2024.103646
  30. Pardo E, Le Cam E, Verdonk F. Artificial intelligence and nonoperating room anesthesia. Curr Opin Anaesthesiol. (2024) 37:413–20. doi: 10.1097/ ACO.0000000000001388
  31. Gottardelli B, Gatta R, Nucciarelli L, Tudor A, Tavazzi E, Vallati M, et al. GEN- RWD sandbox: bridging the gap between hospital data privacy and external research insights with distributed analytics. BMC Med Inform Decis Mak. (2024) 24:170. doi: 10.1186/s12911-024-02549-5
  32. Wyatt KD, Minard-Colin V, Schleiermacher G, Willi M, Volchenboum S. GDPR and data sharing: the pediatric Cancer data commons experience. Lancet Oncol. (2024) 25:e227. doi: 10.1016/S1470-2045(24)00250-X
  33. Zhaoyan Zhang, Yu Qiao, & Peimin Lu. (2024). Self-Reflective Retrieval-Augmented Framework for Reliable Pharmacological Recommendations. Journal of Computational Methods in Engineering Applications, 4(1), 1–12. https://doi.org/10.62836/jcmea.v4i1.040108