Health Data, Workforce Development

A Framework for Autonomous Coding in the Inpatient Setting

Editor’s Note: This is an opinion piece submitted to the Journal of AHIMA. The opinions here reflect the author and not AHIMA. 

I would like to propose a framework for the development of an autonomous coding workflow within the hospital inpatient setting, but one where the autonomy of the code assignment has a different connotation and a different operation than that programmed within the algorithms and processes of other current coding automation techniques.  

The autonomy within this proposed workflow (both in the ownership of a final set of assigned diagnosis and procedures codes, and in the stewardship of medical record documentation meeting legal, regulatory and financial compliance requirements), would be reflective of and have results produced by a true collaboration between a medical coder (or other medical record user) and the machine learning/artificial intelligence (ML/AI) processes contained in the autonomous coding methodology of such a proposed framework. 

Assignability 

To further explore this proposal, let me introduce a concept from medical coding called “assignability.” “Assignability” is the capability or degree of potential for a particular diagnosis or procedure code to be assigned in the final ICD-10 code set of a hospital account, based on the documentation contained in the applicable hospital encounter. 

Clinically relevant terms and phrases are identified and designated from medical record documentation as contributory keywords toward potential ICD-10 diagnosis and procedure code assignment in a few general ways:  

  1. A keyword or set of keywords has a partial or a total match with an entry or entries in the ICD-10-CM Alphabetic Index or ICD-10-PCS Procedural Index.  

  1. A keyword or set of keywords matches an entry or entries in some kind of matrix of logical association, and subsequent terms and phrases are offered as selections of associable keywords where choices among them eventually lead to those same ICD-10 Index entries. 

  1. The use of pattern-recognition algorithms in AI for machine mimicry of how coders make decisions for ICD-10 code assignments based on known and learned rules in the interpretation of the syntax contained in medical record documentation. 

  1. The use of computational linguistics, where the narrative of medical record documentation is broken down with statistical approaches (terms, phrases, paragraphs, and related nominal or categorical data is given a numerical representation and translated into some kind of numerical data construct), and these numerical/statistical constructs are thereby related to assignable ICD-10 Index entries.  

Broadly speaking, the first two methods primarily represent traditional approaches of computer-assisted coding (CAC) that employ natural language processing (NLP) techniques. The latter two methods signify the advanced and innovative AI techniques currently being developed to enhance medical coding automation products.  

More importantly, though, it is already known that any of the methods described above have very strictly defined and limited maximum thresholds for usability. This limitation largely stems from the complex nature of hospital inpatient documentation, which often blends unstructured and structured text. The high levels of complexity, variability, and diversification within this documentation can significantly challenge and sometimes even obscure the processes aimed at identifying and capturing relevant keywords or phrases. This step is essential, as it precedes the establishment of any meaningful associations and correspondences between these keywords and potential ICD-10 code assignments. 

The output of any generated collection of potential ICD-10 code assignments, whether from sets of keywords identified and extracted by an NLP process or by an AI algorithm, has no greater data structure than that of an “unordered list” whose elements follow a straight flat-file data storage arrangement. These identified and extracted keywords possess no more than a nominal data value attribute because their clinical relevance is only based on “linguistic equivalence.”  

As a result, the measures of assignability of potential ICD-10 diagnosis and procedure codes based on these keywords must necessarily only fall on a nominal scale, as well. No matter how such a list of potential code assignments is displayed or how the keyword sources are summarized with their corresponding code assignments, the associated clinical relevance can simply only indicate that a potential code exists within the narrative of the encounter documentation, but it cannot indicate anything further to inform on that code’s context within that narrative. 

The framework I am proposing in this article would add data dimensionality and data granularity to such a list of potential ICD-10 code assignments and their associated keyword sources, so moving the list’s data attribute value from a nominal type to an ordinal type, and thereby turning an “unordered list” into an ordered one. In effect, this framework should automatically provide such code assignment context as it would be developed in natural alignment with how medical coders already categorize and refine the clinical relevance of potential code assignments based on the specific sections of encounter documentation in which the corresponding keyword sources are contained. 

A Framework Based on Indexing 

As is commonly known, medical record encounter documentation is dictated within different document type templates that generally follow the “subjective, objective, assessment, plan” (SOAP) narrative format. Within any document type’s SOAP format (emergency department, history and physical, discharge summary, etc.) are further document divisions (headings, sections, sub-sections, etc.).  

Examples of such headings and sections are the “history of present illness” sub-section in a history and physical report’s “subjective” section; or the “hospital course” sub-section of a discharge summary report’s “assessment and plan” section. More importantly, these headings and sections represent the different medical record document type elements that must be present within their corresponding document type for that document’s narrative to meet compliance requirements for what constitutes a complete and acceptable report. 

Keyword sources for potential ICD-10 code assignments exist within such narrative in a couple of general formats:  

  1. Within a continuous passage or paragraph under a section/sub-section heading (exist within unstructured text – e.g., a narrative of straight sentences under a history and physical report’s “history of present illness” sub-section). 

  1. As discrete units of text in a numbered or bulleted list, as discrete units of text in a list separated by blank lines, or as discrete units of text in a table or diagram (exist within structured text – e.g., a list under the history and physical report’s “review of systems” sub-section; or a list of diagnostic statements under a discharge summary report’s “active hospital problems” sub-section). 

  1. In some combination of unstructured and structured text (i.e., a continuous passage of text that is part of or is associated with a unit of text in a numbered list or in a row of a table – e.g.: an “active hospital problems” sub-section combined with the unstructured text of a “hospital course” sub-section all under a discharge summary report’s “assessment and plan” section).   

What my framework is based upon would be the capture and situation of these keyword sources (and, by virtue, their corresponding ICD-10 code assignments) not only by the document type wherein they sit, but also by exactly under which key component/document element they exist. In other words, the data dimensionality I propose to add is that further specific localization of these captured source keyword terms by related document type element, added to the functionality of overall localization by document type that is already present in a typical NLP process or AI algorithm. 

Ideally, in the same way that, at the back of every book, there is an index where words and phrases are alphabetically listed with references to where they occur, this framework would seek to make an index of all identified and extracted keyword sources (and their corresponding ICD-10 code assignments) listed by such a precise location reference within the set of medical record documentation of an inpatient encounter. 

I believe that an autonomous coding framework that can develop such a granular keyword source, identification, and extraction functionality would be immediately compatible and complementary with how medical coders work in the hospital inpatient setting. With the development of such indexing, coders could move from the consideration of individual assignable ICD-10 codes and their keyword sources to collections of such codes and keyword sources assignable as discrete units across an entire report of a particular document type, across multiple reports of either the same or different document types, and perhaps even eventually across the charts of entire patient accounts. 

For example, as an inpatient coder, I could get immediate answers to more complex coding considerations such as: 

  1. What is identifiable (all keyword sources and their corresponding ICD-10 code assignments) only in the history and physical report? 

  1. What is only identifiable in just the “assessment and plan” section of the discharge summary report? 

  1. What is identifiable in the “assessment and plan” sections of every available consult report document? 

  1. What is identifiable in the “post-operative diagnosis” section of a certain operative report? 

  1. Given the identification findings of two such separate query result sets: 

  • How would the result list of keyword sources/ICD-10 codes identified in the “assessment and plan” section of the history and physical report compare to the list identified in the “assessment and plan” section of the discharge summary report, or compare to the list identified in the “assessment and plan” section of a certain date range of progress report documents? 
  • How would such a list identified in the combined “history of present illness” and “existing co-morbid conditions” sub-sections of the history and physical report compare to the list identified in the “indications for admission” sub-section of the discharge summary report? 
  • How would such a list identified in the “discharge diagnoses: principal problem/active problems” sub-section compare to the list identified in the “assessment and plan’ section of the same discharge summary report? 
  • How would such a list identified in the “findings” section of a certain procedure report compare to the list identified in the “findings” section of a related pathology report? 
  • How would such a list of keyword sources identified for ICD-10 procedure codes in a particular Operative report compare to a similar list identified in another related report (for example, two different operative episodes done by the same performing provider)? 

Indexes of Assignment Leading into Indexes of Assignability

My conclusion regarding the development of such a framework is that the development of such indexes of assignment will eventually lead to the development of “indexes of assignability.” Inpatient coders would now be able to review and consider assignable ICD-10 codes and their keyword sources as entire groups or collections within the parsed or else combined narrative of any kind of desired available document or section context, and thereby be able to simultaneously vary the order and constituency of multiple codes so as to modify and update the overall assignability of an entire code group/collection at the same time.  

I see this as eventually leading to a more effective and more efficient process for inpatient coders to develop that final assignable ICD-10 diagnosis and procedure code set that best represents the overall narrative in the documentation of the entire patient encounter, as the process simply speeds up when coders can determine the codes for assignment as potential final code sets rather than as individual codes across the charts of patient accounts. 

More importantly, though, I feel that such a framework will help to maintain an organic user constituency and promote organic user collaboration within the overall AI and workflow automation framework being constructed across the medical coding field and will help to eventually promote and realize the vision of medical coders ultimately transforming into medical coding validators. 


Rene Datta, RHIT, CCS, CCS-P, is coding quality and education manager for the County of Santa Clara Health System’s Health Information Management Services.