Retrieval-augmented generation for personalized physician recommendations in online medical services: model development study
Published 2025-03-05
Keywords
- large language models,
- mistral, SBERT,
- triage systems,
- retrievalaugmented generation-based physician recommendation,
- RAGPR model
How to Cite
Copyright (c) 2025 Yingbin Zheng, Yiwei Yan, Sai Chen, Yunping Cai, Kun Ren, Yishan Liu, Jiaying Zhuang, Min Zhao

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Abstract
Web-based medical services have expanded access to healthcare through remote consultations and streamlined scheduling, but personalized physician recommendations remain limited due to reliance on manual triage. This study developed and validated a Retrieval-Augmented Generation-Based Physician Recommendation (RAGPR) model to enhance triage performance. Using 646,383 consultation records from the Internet Hospital of the First Affiliated Hospital of Xiamen University, we evaluated embedding models (FastText, SBERT, OpenAI) for clustering and classification, as well as large language models (Mistral, GPT-4o-mini, GPT-4o). Three triage staff also assessed model efficiency via questionnaires. Results showed that FastText performed poorly (F1-score 46%), while SBERT and OpenAI achieved 95% and 96%. Among LLMs, GPT-4o reached the highest F1-score (95%) with a performance rating of 4.67, followed by Mistral (94%, 4.56) and GPT-4o-mini (92%, 4.45). Considering accuracy, cost, and implementation, SBERT and Mistral were optimal. The RAGPR model offers a scalable approach to improving accuracy and personalization in online patient– physician matching.
References
- Nordstrand AE, Anyan F, Bøe H, Hjemdal O, Noll L, Reichelt J, et al. Problematic anger among military personnel after combat deployment: prevalence and risk factors. BMC Psychol. (2024) 12:451. doi: 10.1186/s40359-024-01955-8
- Sumar K, Blue L, Fatahi G, Sumar M, Alvarez S, Cons P, et al. The effect of adding physician recommendation in digitally-enabled outreach for COVID-19 vaccination in socially/economically disadvantaged populations. BMC Public Health. (2024) 24:1933. doi: 10.1186/s12889-024-18648-x
- Brindisino F, Girardi G, Crestani M, et al. Rehabilitation in subjects with frozen shoulder: a survey of current (2023) clinical practice of Italian physiotherapists. BMC Musculoskelet Disord. (2024) 25:573. doi: 10.1186/s12891-024-07682-w
- Rui JR, Guo J, Yang K. How do provider communication strategies predict online patient satisfaction? A content analysis of online patient-provider communication transcripts. Digit Health. (2024) 10:20552076241255617. doi: 10.1177/20552076241255617
- Wetzel AJ, Koch R, Koch N, Klemmt M, Müller R, Preiser C, et al. 'Better see a doctor?' status quo of symptom checker apps in Germany: a cross-sectional survey with a mixed-methods design (CHECK.APP). Health. (2024) 10:20552076241231555. doi: 10.1177/20552076241231555
- Iranzad R, Liu X, Dese K, Alkhadrawi H, Snoderly H, Bennewitz M. Structured adaptive boosting trees for detection of multicellular aggregates in fluorescence intravital microscopy. Microvasc Res. (2024) 156:104732:104732. doi: 10.1016/j.mvr.2024.104732
- Herr K, Lu P, Diamreyan K, Xu H, Mendonca E, Weaver KN, et al. Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital. HGG Adv. (2024) 5:100341. doi: 10.1016/j.xhgg.2024.100341
- Lilli L, Bosello SL, Antenucci L, Patarnello S, Ortolan A, Lenkowicz J, et al. A comprehensive natural language processing pipeline for the chronic lupus disease. Stud Health Technol Inform. (2024) 316:909–13. doi: 10.3233/SHTI240559
- Bonomo M, Rombo SE. Neighborhood based computational approaches for the prediction of lncRNA-disease associations. BMC Bioinformatics. (2024) 25:187. doi: 10.1186/s12859-024-05777-8
- Chew LJ, Haw SC, Subramaniam S. A hybrid recommender system based on data enrichment on the ontology modelling. F1000Res. (2021) 10:937. doi: 10.12688/ f1000research.73060.1
- Abdullahi T, Mercurio L, Singh R, Eickhoff C. Retrieval-based diagnostic decision support: mixed methods study. JMIR Med Inform. (2024) 12:e50209. doi: 10.2196/50209
- Yazaki M, Maki S, Furuya T, Inoue K, Nagai K, Nagashima Y, et al. Emergency patient triage improvement through a retrieval-augmented generation enhanced large- scale language model. Prehosp Emerg Care. (2024) 400:1–7. doi: 10.1080/10903127.2024.2374400
- Gargari OK, Fatehi F, Mohammadi I, Firouzabadi S, Shafiee A, Habibi G. Diagnostic accuracy of large language models in psychiatry. Asian J Psychiatr. (2024) 100:104168. doi: 10.1016/j.ajp.2024.104168
- Arun G, Perumal V, Urias F, Ler Y, Tan B, Vallabhajosyula R, et al. ChatGPT versus a customized AI chatbot (Anatbuddy) for anatomy education: a comparative pilot study. Anat Sci Educ. (2024) 17:1396–405. doi: 10.1002/ase.2502
- Tabaie A, Tran A, Calabria T, Bennett S, Milicia A, Weintraub W, et al. Evaluation of a natural language processing approach to identify diagnostic errors and analysis of safety learning system case review data: retrospective cohort study. J Med Internet Res. (2024) 26:e50935. doi: 10.2196/50935
- Sharif S, Ghouchan R, Abbassian H, Eslami S. Comparison of regression methods to predict the first spike latency in response to an external stimulus in intracellular recordings for cerebellar cells. Stud Health Technol Inform. (2024) 316:796–800. doi: 10.3233/SHTI240531
- Santander-Cruz Y, Salazar-Colores S, Paredes-Garcia WJ, et al. Semantic feature extraction using SBERT for dementia detection. Brain Sci. (2022) 12:270. doi: 10.3390/ brainsci12020270
- Izzidien A, Fitz S, Romero P, et al. Developing a sentence level fairness metric using word embeddings. Int J Digit Humanit. (2022) 10:1–36. doi: 10.1007/ s42803-022-00049-4
- Oh J, Park H. Effects of changes in environmental color Chroma on heart rate variability and stress by gender. Int J Environ Res Public Health. (2022) 19:711. doi: 10.3390/ijerph19095711
- Santana EFM, Araujo JE. Realistic Vue: a new three-dimensional surface rendering approach for the in utero visualization of embryos and fetuses. Radiol Bras. (2019) 52:172–3. doi: 10.1590/0100-3984.2018.0050
- Jolley KA, Bray JE, Maiden MCJ. A RESTful application programming interface for the PubMLST molecular typing and genome databases. Database. (2017) 2017:60. doi: 10.1093/database/bax060
- Wang H, Gao C, Dantona C, Hull B, Sun J. DRG-LLaMA: tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med. (2024) 7:16. doi: 10.1038/s41746-023-00989-3
- Tai ICY, Wong ECK, Wu JT, et al. Exploring offiine large language models for clinical information extraction: a study of renal histopathological reports of lupus nephritis patients. Stud Health Technol Inform. (2024) 316:899–903. doi: 10.3233/ SHTI240557
- Endalie D, Haile G, Taye W. Deep learning-based idiomatic expression recognition for the Amharic language. PLoS One. (2023) 18:e0295339. doi: 10.1371/journal. pone.0295339
- Saito Y, Itakura K, Ohtake N, et al. Classification of soybean chemical characteristics by excitation emission matrix coupled with t-SNE dimensionality reduction. Spectrochim Acta A Mol Biomol Spectrosc. (2024) 322:124785. doi: 10.1016/j. saa.2024.124785
- Clements F, Vedam H, Chung Y, et al. Patient preference of level I, II and III sleep diagnostic tests to diagnose obstructive sleep apnoea among pregnant women in early to mid-gestation. Sleep Breath. (2024) 28:2387–95. doi: 10.1007/s11325-024-03114-0
- Shu D, Zou G. Sample size planning for estimating the global win probability with precision and assurance. Contemp Clin Trials. (2024) 146:107665. doi: 10.1016/j. cct.2024.107665
- Muayad J, Loya A, Hussain ZS, Chauhan M, Alsoudi A, de T, et al. Comparative effects of glucagon-like peptide 1 receptor agonists and metformin on glaucoma risk in patients with type 2 diabetes. Ophthalmology. (2024) 23:S0161–6420. doi: 10.1016/j. ophtha.2024.08.023
- Bertò G, Rooks LT, Broglio SP, McAllister T, McCrea M, Pasquina P, et al. Diffusion tensor analysis of white matter tracts is prognostic of persisting post-concussion symptoms in collegiate athletes. Neuroimage Clin. (2024) 43:103646:103646. doi: 10.1016/j.nicl.2024.103646
- Pardo E, Le Cam E, Verdonk F. Artificial intelligence and nonoperating room anesthesia. Curr Opin Anaesthesiol. (2024) 37:413–20. doi: 10.1097/ ACO.0000000000001388
- Gottardelli B, Gatta R, Nucciarelli L, Tudor A, Tavazzi E, Vallati M, et al. GEN- RWD sandbox: bridging the gap between hospital data privacy and external research insights with distributed analytics. BMC Med Inform Decis Mak. (2024) 24:170. doi: 10.1186/s12911-024-02549-5
- Wyatt KD, Minard-Colin V, Schleiermacher G, Willi M, Volchenboum S. GDPR and data sharing: the pediatric Cancer data commons experience. Lancet Oncol. (2024) 25:e227. doi: 10.1016/S1470-2045(24)00250-X
- Zhaoyan Zhang, Yu Qiao, & Peimin Lu. (2024). Self-Reflective Retrieval-Augmented Framework for Reliable Pharmacological Recommendations. Journal of Computational Methods in Engineering Applications, 4(1), 1–12. https://doi.org/10.62836/jcmea.v4i1.040108