Bridging the Knowledge Gap: Integrating Biomedical Ontologies to Enhance BERT-Based Medical MCQA
Date17th Oct 2023
Time04:00 PM
Venue Seminar Hall (SSB-333)
PAST EVENT
Details
We address the challenge of enhancing BERT-based language models for specialized medical tasks by integrating them with biomedical ontologies. Despite their robust language understanding capabilities, BERT models often need more domain-specific knowledge, particularly within medical contexts. We introduce BioOntoBERT, a BERT-based model pre-trained on multiple biomedical ontologies to bridge this gap. To generate knowledge-rich documents, we develop the Onto2Sen system, which extracts entity names, synonyms, definitions, and concept relationships from ontologies, enriching the model's knowledge of biomedical concepts. Our evaluation encompasses both the MedMCQA dataset and the MedQA dataset medical multiple-choice question answering (MCQA) benchmarks. The results reveal BioOntoBERT's improvement over baseline models like BERT, SciBERT, BioBERT, and PubMedBERT on both evaluation datasets. BioOntoBERT attains this improvement by incorporating only 158MB of ontology-generated data during pre-training, which is just 0.7% of the data used for PubMedBERT pre-training. Extending this work, we fine-tuned BERT-based models using a synthetic MCQA dataset, BioOntoMCQA, constructed using biomedical ontologies. We employ this fine-tuning strategy across various BERT models using the BioOntoMCQA dataset and assessing their performance on MedMCQA and MedQA datasets. The results demonstrate notable accuracy improvements, highlighting the role of biomedical ontologies in improving language models for medical domains. Our findings underscore the significance of fine-tuning with ontology-generated data and model adaptation within specialized domains.
Speakers
Mr. Sahil (CS20S017)
Department of Computer Science & Engineering