Thieme, Alexander Henry and Miri, Tahir and Marra, Alexandre R and Kobayashi, Takaaki and Rodriguez-Nava, Guillermo and Li, Yiheng and Barba, Thomas and Er, Justus and Gertler and Zamboglou, Constantinos and Alyahya, Lujain and Uhlig, Maximilian and Machiraju, Gautam and Weimann, K. and Lippert, Christoph and Conrad, T. O. F. and Ma, Jackie and Novoa, Roberto and Moor, Michael and Hernandez-Boussard, Tina and Alawad, Mohammed and Salinas, Jorge L. and Mittermaier, Mirja and Gevaert, Olivier (2025) Physician-level classification performance across multiple imaging domains with a diagnostic medical foundation model and a large dataset of annotated medical images. medRxiv . (Submitted)
Full text not available from this repository.
Official URL: https://www.medrxiv.org/content/early/2025/05/31/2...
Abstract
A diagnostic medical foundation model (MedFM) is an artificial intelligence (AI) system engineered to accurately determine diagnoses across various medical imaging modalities and specialties. To train MedFM, we created the PubMed Central Medical Images Dataset (PMCMID), the largest annotated medical image dataset to date, comprising 16,126,659 images from 3,021,780 medical publications. Using AI- and ontology-based methods, we identified 4,482,237 medical images (e.g., clinical photos, X-rays, ultrasounds) and generated comprehensive annotations. To optimize MedFM’s performance and assess biases, 13,266 images were manually annotated to establish a multimodal benchmark. MedFM achieved physician-level performance in diagnosis tasks spanning radiology, dermatology, and infectious diseases without requiring specific training. Additionally, we developed the Image2Paper app, allowing clinicians to upload medical images and retrieve relevant literature. The correct diagnoses appeared within the top ten results in 88.4% and at least one relevant differential diagnosis in 93.0%. MedFM and PMCMID were made publicly available.Funding Research reported here was partially supported by the National Cancer Institute (NCI) (R01 CA260271), the Saudi Company for Artificial Intelligence (SCAI) Authority, and the German Federal Ministry for Economic Affairs and Climate Action (BMWK) under the project DAKI-FWS (01MK21009E). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.Competing Interest StatementThe authors have declared no competing interest.Funding StatementResearch reported here was partially supported by the National Cancer Institute (NCI) (R01 CA260271), the Saudi Company for Artificial Intelligence (SCAI) Authority, and the German Federal Ministry for Economic Affairs and Climate Action (BMWK) under the project DAKI-FWS (01MK21009E). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Open Access PublicationsI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors.
Item Type: | Article |
---|---|
Subjects: | Medicine and Dentistry > Clinical Medicine Mathematical and Computer Sciences > Artificial Intelligence > Computer Vision Mathematical and Computer Sciences > Artificial Intelligence > Machine Learning |
Divisions: | Department of Mathematics and Computer Science > Institute of Mathematics Department of Mathematics and Computer Science > Institute of Mathematics > Comp. Proteomics Group |
ID Code: | 3289 |
Deposited By: | Admin Administrator |
Deposited On: | 26 Sep 2025 12:18 |
Last Modified: | 26 Sep 2025 12:18 |
Repository Staff Only: item control page