Research Symposium

25th annual Undergraduate Research Symposium, April 1, 2025

Luis Sanchez Poster Session 3: 1:45 pm - 2:45 pm/ Poster #127


IMG_7784_0.JPG

BIO


I am an undergraduate researcher at Florida State University's eHealth Lab, my focus has been on leveraging natural language processing (NLP) and annotation tools to improve healthcare decision-making. I came across UROP and the E-health lab because of my interest in the intersection of data science and healthcare, exploring ways to enhance medical information accuracy and patient outcomes through advanced data analysis techniques. My research involved annotating social determinants of health data and optimizing NLP and tuning parameters for predictive accuracy.

Annotation and Information Extraction of Social Determinants of Health from Social Worker Notes of Pediatric Transplantation

Authors: Luis Sanchez, Zhe He
Student Major: Statistics and Information Technology
Mentor: Zhe He
Mentor's Department: School of Information
Mentor's College: FSU College of Communication and Information
Co-Presenters: Harjith Pradeep

Abstract


Electronic health records (EHRs) are widely used in developing machine learning models for health outcome predictions. However, these models often exhibit inherent biases due to the omission of crucial risk factors related to social determinants of health (SDoH), which contribute to up to 40% of preventable deaths.
This UROP project aims to establish a rigorous annotation guideline for SDoH-related factors and train transformer-based models to recognize these labels.
The first phase of the project focuses on creating a precise annotation framework to ensure high-quality training data. To achieve this, we collaborated with domain experts in social work and consulted a highly qualified annotator from the University of Florida (UF). Based on these insights, we developed a comprehensive annotation guideline, which will be publicly available in the future. The annotation process consists of two levels: the first level identifies the trigger words corresponding to each SDoH category, while the second level provides more detailed contextual information related to those triggers.
In the second phase, we train transformer-based models to extract trigger words from EHRs. Given the complexity of our multi-label annotation framework, model training presents significant challenges. To address this, we benchmark various BERT-based models to identify the most effective one. Once the trigger words are extracted, the corresponding sentences are fed into large language models (LLMs) to further analyze and extract deeper contextual information.
This project not only enhances SDoH annotation quality but also contributes to improving bias-aware predictive modeling in healthcare.

image (2)_0.jpg

Keywords: Annotation Tags Natural Language Processing Heart Transplant Social Determinants Artificial Intelligence