Large-Scale Automated Encoding of Veterinary Diagnoses with Transformer-Based Language Models

Event Date

Location
Virtual

Presented by: Mayla Boguslav, Adam Kiehl, David Kott

Summary: The encoding of medical records into standardized medical terminologies greatly increases the research value of such data by promoting usability, extractability, and interoperability. While some data can be encoded in a straightforward manner using rules-based methods, diagnoses are often recorded as free-text by clinicians and require sophisticated natural language processing (artificial intelligence) methods to encode. Our research builds on previously developed automated encoding methods and publicly available pretrained large language models to achieve state-of-the-art results on a clinically comprehensive scale.

Mayla Boguslav Bio: Mayla R. Boguslav, PhD is a Postdoctoral Fellow in Mathematics at Colorado State University (CSU) with a research focus on natural language processing. At CSU, her focus is on identifying diagnosis codes from SNOMED-CT for veterinary records with the Data Science Research Institute. In general, she seeks to determine what isn't (yet). She works to uncover our collective scientific questions that are not yet answered by creating taxonomies and using ontologies (controlled vocabularies). Focusing on what science has yet to answer helps researchers locate the role of their ongoing work, clarifies what scientists are saying and what they are not, and rebuilds trust in science. Mayla received the 2023 AMIA Edward H. Shortliffe Doctoral Dissertation Award for her PhD in Computational Biosciences from the University of Colorado Anschutz Medical Campus.

Adam Kiehl Bio: Adam Kiehl is a Health Data Scientist for Colorado State University’s (CSU) College of Veterinary Medicine and Biomedical Sciences (CVMBS) Research IT team. He recently received a Bachelor’s degree in Data Science and a Master’s of Applied Statistics degree from CSU. As a student intern, he was engaged in a long-term data harmonization project aimed at standardizing CSU’s veterinary electronic health records (EHR) to the OHDSI OMOP common data model. He is now engaged with that project on a full-time basis and his efforts have expanded to include the development of natural language processing (NLP) models to automatically encode free-text veterinary records into the SNOMED-CT medical terminology. His primary professional ambition is to advance CSU’s interactions with health informatics in collaborative and innovative ways.

David Kott Bio: David Kott is currently a graduate student in Mathematics at Colorado State University (CSU). His research interests lie at the intersection of machine learning and distributed computing. David is dedicated to exploring innovative ways to leverage these fields to solve complex mathematical problems. His work is characterized by a strong commitment to research and a passion for pushing the boundaries of knowledge.