Natural Language Processing Helps Detect SDOH Issues, Research Shows

Monday, November 20, 2023

A physician performs a blood test. The results show that the patient has a vitamin D deficiency. The condition is assigned ICD-10 code E559.

That’s how the process should work.

But when a person experiences financial stress, there is no straightforward blood test and no commonly used ICD-10 code. So, noting and coding this social determinant of health (SDOH) is a much more complicated matter.

A study recently published in the International Journal of Medical Informatics, however, confirms what has been known for some time: natural language processing (NLP) can help. Perhaps more importantly, though, the study also demonstrates that healthcare organizations don’t have to reinvent the proverbial wheel when leveraging established NLP models to detect SDOH. Instead, they can port NLP models from one institution to another with relative ease.

Researchers who participated in this study developed a rule-based deterministic state machine NLP model to extract financial insecurity and housing instability notes from health systems in central Indiana to the University of Florida Health. Researchers found that the model was easy to port, as they were able to install the software in a new environment and to update the models to meet the needs of new data. In addition, they noted a high level of generalizability, as the performance of the NLP models when applied on new data.

“Where we live, where we work, access to food, transportation and so on – all these social factors really impact our health,” says Tanja Magoc, PhD, one of the study’s authors and artificial intelligence (AI) engineer in the Quality and Patient Safety Initiative at the University of Florida, Gainesville. “However, this information is not really collected in a very organized way in most healthcare systems.”

Challenges to Collecting SDOH Data

While there are some SDOH ICD-10 codes available, up until a few years ago, they were not commonly used. “ICD-10-CM does include codes to capture social determinants of health, including, for example, code category Z59 (problems related to housing and economic circumstances),” says Mary Stanfill, MBI, RHIA, CCS, CCS-P, FAHIMA, vice president of consulting services for UASI, a consulting company based in Cincinnati, OH.

Category 259 is one of several SDOH code categories that have existed for some time. These categories are frequently expanded to meet evolving needs.

While these codes might be more fully leveraged in the future, SDOH information currently is typically collected via informal conversations between providers and their patients.

“Most of the time this information is collected when a nurse or a physician is talking to a patient,” Magoc says. “So, through random friendly conversations, these professionals might find out that this patient missed an appointment because they couldn't afford a copay or because they didn't have transportation to get to their appointment. And this is then recorded in clinical notes as an informal narrative note. So we need NLP to actually extract the SDOH.”

The journal study is the latest in a growing body of work illustrating the value associated with leveraging NLP to detect and document SDOH. A study published in Journal of the American Medical Informatics Association (JAMIA) in October 2021 and a study published in JAMIA in April 2023 confirm that NLP is the key technology to extract SDOH information from clinical text and expand its utility in patient care and research. These studies show that NLP algorithms can successfully be used to extract housing, financial, and employment data from electronic health records. The models are able to measure social determinants well enough for researchers to develop risk models and for clinicians and health systems to use various factors to improve care.

The use of such NLP models could potentially bring much needed relief to HI professionals, specifically coders. “Currently, coders really have to read every single note on their own. And then from that note, they have to define what are really the ICD codes that they need to code for their specific hospital ... so if you have this NLP model that can actually extract all the social determinants of health … [then] you don't have to read every note to see whether you need to code for some of these social risk factors. NLP can actually tell you, ‘oh yes, this patient has housing instability.’ So [NLP] could actually help those professionals to not read every note but actually have a software pull out information for them,” Stanfill says.

Bumps in the NLP Road

Using NLP, however, presents its own challenges. Software needs to be developed to read the notes and extract the requisite information. Such software needs to deploy a rule-based deterministic state machine NLP model that leverages a list of phrases, sentences, and words that are relevant to certain social factors, Magoc says.

“For example, for the NLP model to detect housing status, it needs to be looking for words like ‘homeless,’ ‘shelter,’ ‘patient sleeps in a car,’ or ‘patient stays with another family.’ So we start with these phrases and then from there, our software looks at the context in which the phrases were used,” she says. “It needs to be capable of detecting if this is something that occurred in the past because sometimes a clinician might note that a patient didn’t have a home three years ago, but now they do.”

Healthcare leaders have been reticent to try to port previously established models from location to location. Worries about accommodating variations in language from region to region were especially concerning.

In the recently published study, the rule-based deterministic state machine NLP model was built in a manner that makes it easy to simply add words or phrases to the mix to accommodate emerging SDOH or variations in language. Historically, other NLP models required developers to annotate these changes manually and then completely retrain the model, which is a labor-intensive process, according to Magoc. This new NLP model, however, can extract social factors from clinical notes and showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, researchers obtained promising performance from an NLP-based model.

More specifically, researchers found that the words used to describe SDOH actually didn’t vary much from region to region. What did vary were some of the local names of resources. For example, local community service organizations, food pantries, and shelters all have different names. This flexibility makes it easy to add these phrases, tweak the model, and port it from institution to institution, even when healthcare organizations are located in different regions of the country with widely divergent dialects.

“This is one of the first studies that shows it is possible to port models from organization to organization. So, we’ve proved that it is possible to share what we have accomplished so we don’t have to develop new models for the same thing at every single institution. That's how we advance science and advance healthcare much faster,” Magoc says.

Being able to widely deploy NLP models is expected to make it easier for HI professionals, specifically coders, to optimally do their jobs. Instead of spending an inordinate amount of time reading every single note searching for information, this NLP model speedily extracts all the SDOH information.

“As a [health information] professional for more than 30 years, I have always considered it my responsibility to ensure providers have the information they need to make patient care decisions. This spans all types of health information from abnormal test values that require follow-up care to social factors that present barriers to such follow-up care,” says Stanfill. “With the growing understanding of the importance of SDOH information, it is incumbent on us to find ways to capture, organize, and present this information to care providers. Therefore, it’s important that we are knowledgeable of the variety of structured and unstructured data processing methods that may help accomplish this.”

John McCormack is a Riverside, IL-based freelance writer covering healthcare information technology, policy, and clinical care issues.

Tags: social determinants of health , ICD-10 , Natural language processing , NLP model