Improving the Performance of a Medical Entity Linking Model

The healthcare industry generates a large volume of data today. With the increase in the amount of data, the search for the necessary documents and articles for research as well as the structuring of data becomes more complicated and time-consuming. NLP-based Medical Entity Linking makes it easier and faster to uncover insights in unstructured clinical and biomedical data.

Medical entity linking is the process of mapping medical terms in the healthcare documents to the concepts in the knowledge base. The entity refers to a specific word, concept or phrase, such as names of diseases, symptoms, drugs, therapeutic classes, doses, treatments, and related terms. The entity is fetched from the knowledge base. The knowledge base is a repository of medical terminology that includes standardized names and their synonyms, definitions, and contextual information.

Identifying standardized concepts from a scientific article or clinical documentation is essential to harmonize data and make it available for semantic analysis. Medical entity linking makes this possible by adding metadata to the entity and identifying it in the knowledge base. This harmonizes medical data across clinical records, biomedical text, scientific papers and other documents, making it available for analysis, processing and uncovering insights.

Components of Medical Entity Linking

Medical Entity Linking uses natural language models to extract valuable information from complex and scattered medical text. There are multiple components associated with entity linking. Let’s take a deep dive into each component of Medical Entity Linking.

Named Entity Recognition

Named Entity Recognition (NER) is the process of recognizing named entities in the model. The variety of entities recognized depends on the model. In the medical domain, entity recognition such as disease, chemical, symptoms, drugs, genes, adverse drug reaction (ADRs) is common. To detect such entities, a state-of-the-art model is adopted and fine-tuned on the data corresponding to the type of entity recognition.

Entity Linking

In Entity Linking, the detected entities are linked to the list of concepts in the knowledge base. Knowledge base (KB) is a corpus of medical terms with metadata such as names, definitions, and unique concept identifiers. One such knowledge base is the Unified Medical Language System (UMLS). UMLS is essentially a thesaurus for medical terms and includes a multitude of other vocabularies. Other KBs are mesh, gene ontology (for genes), human phenotype ontology. From the KBs, the recognized entities are linked to concepts in knowledge bases using techniques such as TF-IDF and text matching.

Named Entity Disambiguation

The next step in the process is to eliminate ambiguity in the text and link it to the correct concept in the KB. For instance, a particular entity may have multiple meanings associated with it, for instance, the word ‘cold’ can have several meanings, it can mean temperature or flu, depending on the context.

Concept Linking

Concept Linking, also known as concept normalization, can be leveraged to improve the accuracy of the model. The most common way to increase the accuracy is to take semantic types of entities into consideration while performing concept normalization. Usually, accuracy is increased in the range of 1-5%, depending on the type of entity recognition task. Another way of enhancing accuracy is by reranking the concepts and then using a semantic type of entity to further improve the model.

Today, many state-of-art language models are available worldwide that can be fine-tuned on the desired downstream entity recognition task for medical entity recognition. Language models such as PubMed and Electronic Health Records (EHR) are trained on medical texts to perform better than the models trained on non-medical texts.

With the increasing amounts of healthcare data and computational power, improved models are required to address the shortcomings of the existing methods and boost the performance of medical entity linking. Quantiphi’s NLP capabilities enable the extraction of healthcare information from medical text. This information includes medical concepts, medical functions, and therapeutic relations that help healthcare professionals make critical decisions faster and backed by data.

To improve the performance of medical entity linking models with NLP, get in touch with our experts now.

Written byAyush Raj

Get your digital transformation started

Let's Talk